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影印版前言 


本书是一本有关时间序列分析应用于实际的实证分析研究的专著。 
全书分为两大 部分： 第一部分简要介绍了时间序列分析的基础理论和方 
法，这些内容是读懂本书各案例研究所必备的基本 知识； 第二部分是案 
例研究。•从中读者可看出时间序列分析是如何广泛地应用于实际並成为 
解决各种问题的核心工具。书中的案例涉及到当年中国科学家从自己的 
观测记录中是如何发现天王星的光 环的； 滤波理论如何应用于中国东海 
和黄海的重力 勘探； 谱分析如何判别先天性愚型儿童的脑电特征、多元 
谱的 K - L 信息量如何应用于优秀飞行员的生理特征的检测；潜周期分析 
如何发现离体脑垂体仍有内分泌的节律周期；预测理论如何应用于气象 
的建模和预报，等等许多非常有趣而真实的研究案例。这些研究成果使 
作者获得了中国国家自然科学奖和国内外的多项奖项。 

读者通过本书的学习不仅可以学到时间序列分析的基本理论和方 
法，更重要的是本书介绍了 “如何将一个实际问题转化成数学问题”， 
然后运用数学和统计学的理论和方法加以解决，这包括最后还原到实 
际，用实验数据加以检验的完整过程。 

本书出版以来受到校内外读者和许多著名统计学家的好评（请参 
见本书封底 S . N . Gupta 等人的评论）。本书可作为应用时间序列分析领 
域的大学生和研究生教学参考书或补充 教材； 也是应用统计工作者和许 
多学科领域的科技人员、工程师很有价值的参考资料。 


谢衷洁 

北京大学数学科学学院 
2006 年 11 月 



Preface 
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In 1958, Chinese scientists, technologists, mathematicians, etc” in answer to their 
Government’s appeal, went out of colleges and institutes with an ardent zeal for 
reconstructing their motherland and joined a movement of integrating theory and 
practice to do real practical work in factories and farmlands. This nation-wide move¬ 
ment gave tremendous impetus to further developments of sciences and technologies 
particularly mathematics. The author, at that time, was a fourth year undergradu¬ 
ate in a course of five years majoring in mathematics at the Peking University. He 
and two other classmates, under the guidance of Professor Chiang Tse-pei went out 
of the University to take part in a radar project to analyze the performance of filters 
using their knowledge of spectral analysis learned in class. It was from this occassion 
that the author developed an affinity for applied research. 

The real, solid and all-front development of applied probability and statistics in 
China started after the promulgation of a policy of reform and opening to the outside 
world by the Government in the 1980’s. In the Second National Conference of the 
Chinese Society of Probability and Statistics, both Chairman T. Chiang and Vice- 
Chairman Z. Wei stressed the urgent need of applied research in probability and 
statistics with practical applications as the primary motivation. Their views were 
warmly received by the audience. In only a span of a few years, applied research in 
probability and statistics fluorished first in quantity, and then in quality, and has 
now attained a fairly high level. A group of devoted probabilitists and statisticians 
is gradually forming; so how to train good researchers with emphasis on applications 
has become the key item on the agenda. 

With the encouragement of my colleagues in the Department of Probability and 
Statistics, especially the Chairman, Professor Chen Jiading, I inaugurated a course 
on “Case Studies of Applied Probability and Statistics” for second year post¬ 
graduate students in 1987. I discussed in class some of my case studies that were 
closely related to the basic principles and had achieved good results. My aim was to 
see that a student, after a year’s training, should understand how abstract mathe¬ 
matical concepts are related to practical problems, learn how an applied mathemati¬ 
cian thinks, and what methodology should be adopted to solve a practical problem. 
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In class，I never refrained from talking about my experiences of failure. The students 
liked my class. Perhaps, it was such experiences that won me the students ， appre¬ 
ciation. In 1987, I was invited by Linz Kappler University of Austria to lecture on 
a course “Applied Time Series Analysis” to postgraduates, who heartily welcomed 
it. Many friends and colleagues, upon learning of my achievement, encouraged me 
to write a book on “The Case Studies of Applied Time Series ”； Professor Yan Shi- 
jian, former Chairman of the Chinese Society of Probability and Statistics, gave 
the biggest push. Through kind arrangement of Mr. Xu Jiagu of World Scientific 
Publishing, the book is now published. I， as the author of the book, wish to thank 
them all and say the following words to the reader. 

1. I must have a word of warning for scientists and technologists who, having 
worked with various software packages on a computer for too long，may not have 
bothered to find out the relevant theoretical background and/or premises, as they 
may then arrive at wrong conclusions. In order to help readers who are not time- 
series specialists to better understand the theoretical basis of these methods, I have 
collected a number of them, hitherto scattered in different books and journals, and 
explained them either in the first part of the book or when discussing the appropriate 
case studies. The time-series specialists, however, may skip over these explanations 
without loss of understanding of the context. 

2. Nearly every research is constrained by a time limit, and to get the solution 
done before the time is up often becomes the goal of a project. It is well-known that 
the solutions of many practical problems axe not unique. A solution given, here in 
the book is often only one of the possible solutions, not necessarily the best. Many 
problems could have been done better, but shortage of time prevented me from 
indulging in searching for the best solutions. 

3. This book is intended as a reference book in applied statistics, though it actually 
consists of brief reports of my own research. It may be used as a source book for 
student seminars. 

Besides the case studies listed in the book, I have also done some other researches 
closely related with time series. For instance, I have collaborated with some com¬ 
munication engineers to study various methods of reducing intersymbol interference 
in a tropo-scat ter conununication system; such methods are quite closely connected 
with some theories and methods of time series. In oil exploration, to predict from a 
few deep-well data (e.g. permeability data) permeability rates at points in the upper 
space over the oil fields may be treated as a problem of modelling and prediction 
of a spatial series. Intermodulation communication analyses in satellite communica¬ 
tion and predictions of workers’ hearing loss due to factory noise are all related to 
nonlinear models. Owing to the limited time available, such topics have not been 
included here. 
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Zong-shu who, not only encouraged me to write the book, but also kindly consented 
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I also owe Messrs. Zheng Pingping and Zhang Dabao heavily for their patient 
undertaking of the tedious work of typing and I thank them for their seriousness 
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Finally, I wish to express my hearty thanks to the International Center, Depart¬ 
ment of Mathematics, Waseda Universityk, particularly to Professors T. Kusama 
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PART ONE 


AN INTRODUCTION TO 
THE THEORY AND METHODS OF 
TIME SERIES ANALYSIS 
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CHAPTER 1 

Theory of Stationary Time Series 


In this chapter, we shall introduce some basic ideas and models of time series 
which are very often used in applications. One of the most important concepts in 
time series is “stationary” even though observations recorded in practice are not 
always stationary. The reader may easily find that many techniques and theory 
for analyzing nonstationary data series are based on the theory and methods of 
stationary time series. 

ARMA series is the most important stationary time series, which plays a cen¬ 
tral role both in theory and applications since ARMA model not only covers a lot 
of problems in diverse fields but also relates with very deep mathematical back¬ 
grounds, such as rational spectral functions, Markovian extension problems, state 
space models, etc. Accordingly, we shall discuss the ARMA model in rather detail. 

Some basic laws of large numbers are also introduced in this chapter which are 
particularly important for the estimation theory and techniques in statistical anal¬ 
ysis of time series. 


§1.1 The Definition of Stationary Stochastic Processes 

The reader is presumed to have a basic knowledge of single and multiple random 
variables and their distribution functions. But in practice, many many problems 
are related with infinite random variables, at least for the convenience of theoretical 
analysis we need not restrict ourselves only to the finite number of random variables. 
Some examples will be given in the sequel. 

Definition 1.1 (Stochastic Processes). Suppose that (fl, J, P) is a probability 
space, and T is an index set. If for any t G T, there exists a random variable ^t(^) 
defined on (n, 7,P), then the family of random variables € T} will be 

called a stochastic process. 

In Definition 1.1, the index set T may be understood as any set, real or complex, 
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finite or infinite, countable and uncountable, etc. In this book, T generally repre¬ 
sents the time index set, such as T = {^ : i = 1,2,...} or T = {i : a < t < 6} and 
denote the real line as it!i, the integer set as Z. In particularly，when T is a discrete 
time index set, then we shall call {Ct(w )， 亡 G T} a time series. 

The random variables t 6 T in Definition 1.1 should be understood as 

complex-valued variables in general, so the moments are defined as 

EitH =E^ t R) (u) + (1.1) 

+ - u.2) 

when (1.1), (1.2) exist, where f ⑻ (u;), ((’)(w) are real and imaginary parts of f.(w) 
respectively. 

When T = {t : a < t < b} f for a realization of ^ G T}, our convention 

is to denote it as {X t (w),^ G T}, which can be considered as a function of t. In 
practical record, different realizations Xf(u ； 2 ),... , Xt(^J n ),t 6 T, may not 

coincide with each other. In general, w\Tt(o;),^ 6 T, presents dual specifications: on 
one hand, the realization given by the recorder in practical problem seems like a 
real curve or a deterministic function; on the other hand, theoretically, we always 
consider the observed sample as a stochastic process, i.e. its “randomness” still 
exists in mathematical analysis. 

In the sequel, a succinct form of stochastic process is {^ t ,t G T 1 }, the element w 
will be omitted. 

Definition 1.2 (Gaussian Stochastic Process). Suppose that {ft，t G T} is 
a real-valued stochastic process, if for any ,^ 2 » • • • >^n ^ the n-dimensional 
characteristic function of ， Ct” … ，心 „} can be represented in the form of 

/(u) = exp{ ia ’11 一 |u’Eu} (1.3) 

where a = ⑷， a 2 , … ， a n ) r is a real vector of n-dimensions, E is a real, non-negative 
definite symmetric matrix, then {$t，t € T} will be called a Gaussian process or a 
Normal process. 

It is very clear that if the covariance matrix is positive definite E > 0, then the 
distribution function determined by (1.3) is an n-dimensional normal distribution, 
with probability density AT(a, E), i.e. 

p(x) = (25r)-? (det E)-i exp{-iX , E- 1 X} ) (1.4) 

where X = x — a. Now, if E > 0, det E = 0 may occur, so we cannot expect that 
(1.4) will always keep true, but the following derivation shows that (1.3) will still 
be a characteristic function of an n-variate vector. 
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In fact, we can put 

T, n =E-f I/iV (1.5) 

where TV > 0 is a sufficiently large integer, I is an n x n unit matrix. Then for any 
non-zero n-variate vector a G Rm we have 

a r Y^f^a = a^Ea + > 士 o/o: > 0 

which shows that 

/V(u) = exp{ta ; u - Iu'Enu} = /(u) exp{- 士 u'u} (1.6) 

is a characteristic function. Now, for u in any bounded set U, /at(u) will uniformly 
converge to /(u), and /(u) is continuous at u = 0. According to the limiting 
theorem of characteristic functions (see Cramer (1946)), we know that /(u) is a 
characteristic function of random variables. 

Now suppose that 6 T} is a complex-valued stochastic process. We shall 

call t G T} a Gaussian process or Normal process if for any integer n > 0, the 
joint 2n-variates 

(tW An cW An cW c(0\ 

ICti » ， Ct 2 ，， … » ^t n > Ci n ) 

of the real and imaginary parts of complex variates ($“，". , $t n ) are Gaussian, i.e. 
their characteristic function possesses the form of (1.3). 

The following theorem gives the existence of Gaussian processes, its proving can 
be found in Doob (1953). 

Theorem 1.1 (Existence of Gaussian Process). Let T be an index set, a t 
a complex function defined on T, and at )S a bivariate function on T x T. If the 
following conditions: 

1. = ^7, V(s,t) € T x T, 

2. For any ti 1 t 2 , - - • ， t n eT, 

S = ( <7 «.,t y )i<t,y<n > 0 (non-negative definite), 

are fulfilled，then there exist a complex Gaussian process {^, t G T}, such that 
E^t = 

- a t )(^ 3 - a a ) = cr t>a , V(i,5) eT xT. 

This theorem is very important both in theoretical and methodological research 
in time series analysis. 
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Definition 1.3 (Stationary Process). Let { 心， 尤 E T} be a stochastic process 
with second ordered moment, E\^t\ 2 < +oo, t E T. ft is said to be a stationary 
process in the wide sense if the following conditions are satisfied: 

1. Eit = a, Vi 6 T, (1.7) 

2. E{it- Eit)[i a -Ei a ) =R(t-s), Wt.seT. (1.8) 


The condition of (1.7) shows that the mean of ^ is invariant for the shifting 
of t. In practical problems, the recording curves {Xt,t G T} look like a random 
vibration along a constant level E^t (see Fig. 1.1). The condition of (1.8) can also 
be rewritten as 

EUt +r - E^ r )(J^El t ) = R(r) t (1.9) 

for any t, t t G T. 



Fig. 1.1 Realization of stochastic processes 

Accordingly, the second condition in the Definition 1.3 can be understood as that 
the covariance function of vs. is invariant for the shifting of t. This means 

that the statistical linear correlation of ($t +ri $<) only depends on the spacing r, 
and is free from their initial time t. 

Generally, we call 

B(t) = E(Ct + rTt) ( 1 . 10 ) 

the correlation function of € T and R(t) the covariance function. No loss of 
generality, we shall always assume that the stationary process in the wide sense has 



zero mean E^t = 0. In fact, we can always put 


7 


it = (t - E^ t = 6 — a ， (1 11 ) 

then is stationary in the wide sense with zero mean and has the same covariance 
function as 


R^( r ) = EUt = E(^ t +r - - a) = R^(t). (1.12) 

When has zero mean then the covariance function will coincide with its correlation 
function 

= B d T )- ( 113 ) 


Definition 1.4 (Stationary Process in the Strict Sense). Suppose that 
{$t,t G T} is a stochastic process, for any • ,t n G T\ ti+T i t 2 ~\-T i ... ,f n -fr G 

T, the joint distribution function of U tl , 6 3 ，. •• ， 6 J and U tl +” 6 2 +r，• • • ， (t n + r ) 
are equivalent, i.e. for any (x 1? ... t x n ) E i? n , 

,t n (^l» - » x n) = 八 I+r”.. A+r^l，• • . » x n)- (1-14) 


The condition (1.14) may be understood as the joint distribution function of n- 
variate ， • • •，of a stationary process in the strict sense and is invariant for 
the homogeneous shifting of index t. Recall the conditions of (1.7) and (1.8) for (ti 
wide sense stationarity emphasizes the invariant properties on the first and second 
moments of (t rather than distribution functions. 

A stationary process in the strict sense does not mean that it must be a stationary 
process in the wide sense, since the condition (1.14) dots not ensure that its first or 
second moment will exist. Conversely, a wide sense stationary also may not possess 
the strict sense stationary (1.14). The following is a well illustrated example. 

Let be a random variable uniformly distributed on [-1,1] and independent of 


( 2 , $ 3 ,…， which are i.i.d. normally distributed random variables with mean a 
and variance a 2 = 1/3, then = 1, 2,... is stationary in the wide sense: 




Ri(T) = { 0, 


0 ， i = 1,2,... j 

1/3 ， r = 0 ， 
r ^ 0. 


(1.15) 

l,i = 1, r = 1 the 


But is not a strict sense stationary series, since for n 
distribution of is U\—l, l] and &+ r is JV(0,1/3). 

However, a strict sense stationary process ft, ( € T will be a wide sense stationary 
process if for any t ^ T, 

五 i&r < +oo (i.i6) 
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holds. 

First of all, it is apparent that the existence of the covariance function of can 
be shown by the Schwartz inequality: 

\E^t+rTt\ < (£|6+r| 2 £|6| 2 ) l/2 < + 00 . 

Now we only want to show the assertions (1.7) (1.8) in the case of real-valued 
process since in the complex case the derivation is quite similar. 

Let ^(x) be the distribution function of $t，t £ T 、and F tltt2 (x 1 y) be the joint 
distribution function of ($ tl , $t 2 ), then according to the invariant shifting property 
of distribution functions, we have 


E^t = / x dF t (x) = I x = a, 

JRx JRi 





xydF t+r , t (x,y) 


工 y dF r> 0 {x } y) = R(t), 


(1.17) 


which show that is stationary in the wide sense. 

Now suppose that < 6 T is a Gaussian process, then ^ is a stationary in the 
strict sense if and only if it is stationary in the wide sense. 

In fact, if is stationary in the strict sense, then it also is stationary in the wide 
sense since the Gaussian process possesses the first and second ordered moments. 

Conversely, if is stationary in the wide sense, then for ... , t n G T and 
ti + r,... , 4- r 6 T, we have 


=^, ti e r , 

丑 (ft . •一 o) ( — o) — tj) 

= — a )($ty+r - o), U. 18 ) 

(1.18) shows that the means and covariance functions of (f tl + r ， … ,$t n +r) and 
(fti ，…， D are the same, hence their characteristic function (see (1.3)) are equiv¬ 
alent, i.e. they have the same distribution functions. 

Since we shall emphasize the study of stationary process in the wide sense in the 
sequel, we will briefly call it as “stationary process” unless specified otherwise. 

Now, we will give some simple examples of stationary processes that are very 
useful in theory and applications. 


Example 1.1 (White Noise). Let ^ 1 , • • • be i.i.d. series with 

f Eii = a, 

l - a)((y - a) = <r 2 6 itJ J a > 0 t 


(1.19) 



where 6i，j = 1, if : = y and = 0 otherwise. Then ft is a stationary series in the 
strict sense as well as in the wide sense. 

We shall call {^t,t 6 T} a standard i.i.d. series when <7 = 1 and call a series 
{rjt,t G T} white noise if it only keeps (1.19) true. 


Example 1.2 (Sinusoid with Random Phase). Let = 1,2, … ,m, be 

i.i.d. uniformly distributed random variables on [—7r,7r], then 

m 

6 = ^2 Ak exp{t(w^t + teRi, OT t ez, (1.20) 

k=l 

is a stationary process, where Ak,Wfe,k = 1,2, … ,m are positive constants. 

In fact, the following equalities show the stationarity of $ t : 


m 

E^t = ^2 A k e iuti Ee iet = 0 , ( 1 . 21 ) 

k= 1 

mm 

= 'Y^,'}Z AkAiE ^ ei{Ukt+ek) ~ iiul ' s+$,) ^ 

k-11=1 
m 

== R(t-s). (1.22) 

fc = l 


where t, s 6 i?i, or t, s £ Z. The last second equality is derived from the fact 

Ee iek Ee~ ie, 


Ee i6k t 


—Mr 


0 ， k 手 I 、 
k = l. 


Example 1.3 (Sinusoid with Random Amplitude). Let ^ t ,i = 1,2, ... , m be 
independent random variables with E^i = 0, = of, then 

m 

m=Yl t e fli, or « € Z, (1.23) 

Jt=l 

is a stationary process, where /c = 1 ,... , m are positive constants. 

Then the mean of rj t is zero for i G or ^ 6 Z, and the covariance function 

mm 

k~l l—l 
m 

= o\ exp{to;/t(f — s)} = R[t — s). 


(1-24) 
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The reader can easily show the following proposition is true: 


Suppose that G ，i = 1,2,... , m are random variables with zero mean ♦ u；/, 
k 关 l, E^k = 0, E\^k\ 2 < +oo, then rj t = $fce tWAct is stationary if and only if 

= 0, A: / /. 

Readers can also prove without difficulty that when {0 t }, {$fc} are independent 
random variables with distributions as mentioned above, then 

m 

m = ^e i{ukt+6k) , te RrOi t 6 Z, (1.25) 

k=l 


is a stationary process. 

It is enlightening to see from the following proposition why a number of stationary 
processes are associated with the exponential functions: Let f(t) be a continuous 
function on i2 1} $ is a random variable with = 0, £*|^| 2 < + 00 ， and put = 
(/(0 , （ € 丑 1 ， then ^ is stationary if and only if 

f(t) = e ,At . (1.26) 


We leave the proof to readers. 

Example 1.4 (White Noise Filtration). Suppose that {et^t 6 Z} is a white 
noise series, and ho, /ij } ... , h m are constants, then 

m 

= J2 h ^t-k, teZ (1.27) 

k=0 

is a stationary process. 

(1.27) shows that the is an output of a digital filter vs. the input of a white 
noise series. An engineering diagram is as in Fig. 1.2. 

The covariance function of ^ is 

mm 

7t!(^ - ^) = h k hiEe t -k^s-l 

k = o / =0 

m 00 

= 〉 : ^ l k^{t — s)-\-k ~ > : ^k^(t — a) + ki ( 1 . 28 ) 

k=0 k=0 

where = 0, when /i < 0 or /i > m. 
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d =■ \ 



Fig. 1.2 Digital filtration of white noise 


Example 1.5 (Non-Linear Stationary Series). Let t = 0, 土1，士 2，... be 
the real standard i.i.d. noise, a > 0, then 

^ (1.29) 


is a stationary series. 

In fact, 

E^t = Ezt + aEet-iEet-2 — 0 , 

EH = E[e t + as t -iet-2)(^3 + ae a -ie a - 2 ) 

Eet€ 3 - {- — 2 ^^ + CnEEtE3—1^3 — 2 + Eet—\E：t — 2^a—l^3 — 2i 

when t = s, 
when t ^ s. 

Eet-ie：t-2^a = W ， 5 ， 

Ee t E 3 -i£ 3 -2 = 5, 

Ee\_ x Ee\_ 2 = 


Ee t e 3 


{:: 


EEt-\€t^2 E a-\ e 3-2 

Eiti, 


I Ee 

l 0, 


when s = t, 
otherwise. 


u , 


1 + o: 2 , when t = s, 
otherwise. 


Hence, (t itself is a white noise series. 


Example 1.6 (Random Telegram Process). Let t £ R x be a real two-state- 
valued {0,1} process, 

P{(t = 1} = PUt = o} = i, 
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the alternative numbers /x of states of the waveform in an interval [Ti,T 2 ] is inde¬ 
pendent of and possesses the Poisson distribution, i.e. 

P{HT = k}= ( Ao J) fc e -n。， k = 0,l,... (1.30) 

where = T 2 — Oo > 0, then 匕 is a stationary process. 

• In fact, we only need to check the conditions of (1.7) and (1.8) respectively. 

1. (1.31) 

2 . E^t+rit = P{it = l}P{/i|r| = even numbers} 

= I Y" Aolrl ( A o! r l) fc 

_ 2 t A k\ 

k is even 
k>0 

=i(l + c~ 2Ao,r, ) 

4 

= B(t) (1.32) 

So, 

R(t) = B{t) - (E^ t ) 2 = i e (- 2A ol r D, reRi, (1.33) 

4 

is free from the initial index t. 


§1.2 The Spectral Representation of Covariance Function 

In §1.1, we have discussed the stationary process and its covariance function. In 
this section, we shall investigate the harmonic analysis of the covariance function 
and introduce some very important results on the spectrum of stationary process. 

First of all, the following theorem is very useful for understanding elementary 
properties of covariance function (C.F.) R(k). 

Theorem 1.2 (Elementary Property of C.F.). Let _R(r),r € T be the C.F. of 


a stationary process &， then the following hold: 

1. fi(0) > 0, (1.34) 

2. R(-t) = r e T, (1.35) 

3. |iE(r)|<i?(0), rGT, (1.36) 

4. For any ti,... ,t n 6 T, the matrix — tj)) 1<t .^ <n is non-negative definite. 

The proving of the theorem is very simple. Put a = B( t ,t 6 T then: 

1. fl(0) = -a| 2 >0 (1.37) 

2. R[-r) = E(Ct-r - a)(jr^) (1.38) 
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= ^($t-r+r — a)($t-r - a ) 

= Rij), t,t + T eT (1.39) 

3. According to the well-know Schwartz inequality, we have 

I 他- a)(J^)\ <(E\^ r - a| 2 )" 2 ⑽ o - a| 2 )" 2 (1.40) 

⑼ = i? ⑼ (1.41) 

4. For any complex vector a = (ai, a 2 ,... , a n ) 

clRcl* =(a l ,a 2 ,... ,a n )(i2(t, - ty))(&i,«2 ,... ,a n ) / (1.42) 

n 

=E\Y,^t k \ 2 >0. (1.43) 

k=l 


In Example 1.3, we have obtained the following representation of the covariance 
function 


i2(r) = o\t xuikr ^ r G i?i or r G Z, 


(1.44) 


for the sinusoid process 


m = ^ e<Ukt ' teRrovtez. 


(1.45) 


Now, if t € then using the Fourier-Stieltjes integral, we can rewrite (1.44) 
into 

R(t) = / e* rA dF(X), reRu 


where 




dF(\) = Y,° 2 ^-^) dX - 


(1.46) 


If t G Z, and |叫| < 开， then we have 


R(r) 


e ,rA dF{X), r G Z, 


(1_47) 


where dF(A) is similar to (1.46) and A 6 [—7T, n]. 

In Example 1.6, we have obtained the representation 


R(t) = - exp(-2A 0 |r|), t 6 i?i, 
4 


(1.48) 
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and now we also can rewrite it as 

R(r) = / e ，rA dF(\), 

JRx 

where c/F(A) = /(A)dA, and 


/ ⑷ 


入。 


2 丌 A 2 +4 入 g ， 


入 € . 


For the white noise, either i.i.d. series or uncorrelated series, we have the covari¬ 
ance function as 

cr 2 , A; = 0, 

0 , 

and so, we again have the integral representation of R(k) as 


R{r) 


R{k) = / e lfcA ^F(A), 


where dF(X) = f(X)d\^ and 


/( 入 ）=& A e [-丌，丌 j . 


(149) 


All of the results mentioned above are the consequences of the following Theorem: 


Theorem 1.3 (Spectral Representation of Covariance Function). Suppose 
that {i2(A:), = 0, 土 1 ， ... } is a covariance function of a stationary series, then R(k) 

can be represented as the following integral: 

R{k) = f e ikx dF(X), A: = 0, ±1，士2, •. • ， (1.50) 

J — n 

where dF is a finite measure on [—7r, n}. On the Borel field of [—7r ? 7r], the measure 
is uniquely determined by R(k). 


Since the proving of this theorem can be found in many text books, such as 
Gihman and Skorohot (1969), so it will be omitted here. 

When the ft is a stationary process with the covariance function R(u ), u G R '、 
then (1.50) will be changed into 

/2(u) = [ e inX dF(A), u 6 R u (1.51) 

JRi 

where dF is a finite measure on R\. On the Borel field of i?i, the measure is also 
uniquely determined by R(u). 
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We shall call the spectral function of the process If dF(X)/d\ = /(A) 

exists a.e. on [—丌 ，丌 1， then we call /(A) the spectral density function of the process 
In the general case, the spectral function F(A) will consist of two parts — the 
absolutely continuous function and the step function (see Fig. 1.3) ， i.e. 

= f (m) dfi + [ o\, A e [—7r,7r]. (1.52) 

J ~ K {k:u k <X} 

A sufficient condition for the existence of the spectral density function /(A) is R{k) 
to be summable, i.e. 

oo 

\R{k)\ < 4 - 00 , (1.53) 

fc=—oo 

in such case we have the integral representation for R(k) as 

R[k) = ^ e ikx f{X) dX. (1.54) 

J —n 
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this means that (1.55) is the Fourier series of /(A), i.e. 


,( A ) = 士 E RWe- ik \ Ae[-7r,^l ， (1.57) 


and R(k) is the Fourier coefficient of /(A), 

R(k) = J e ,fcA /(A) dX, k = 0,±1,±2,_ (1.58) 

Usually, we call (1.57), (1.58) as the Wiener-Khinchin formula. Since dF(\) is a 
measure on [— 7r,7r], so we always have 

/(A) > 0, a.e. on [—7r, 7r]. (1.59) 


In the case of a stationary process with a continuous index t y the following con¬ 
dition 


'Ri 


|iE(tx)| du < +oo, 


(1.60) 


will ensure the existence of a non-negative function /( 入 ） such that 

、 R[u) = [ e iuX f{X) d\,u S R u 

Jri 


and 


/(A) 


-tuA 


2rr 


Ri 


R(u) duy A E Ri ， 


Example 1.7 (White Noise). Let $i, $ 2 , ••- ， fn， … be the white noise, R(k) the 
covariance function of 匕 ， then 

丑⑷ = 仃 2 〜, 0 ， 

thus the condition of (1.53) is satisfied. 

By the Wiener-Khinchin formula, we have 

fW = 士 [ R i k ) e ~ lkX = A € \-n y n}. ( 161 ) 


Example 1.8 (Sinusoid with Random Phase and Amplitude). Let 

m 

6 = ^ Cit exp{t(u ； jfc« + 〜）}， t e Ri,ot t e z, 

k= 1 
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where LJk i=- u ； y，/c # j. ， {Cfc ， 沒 / ，免，名 =1 ， ... ， m} are independent random variables, 
= 0, <J^(k) = {Oi,l = 1，... , m} are uniformly distributed on [—7r, 7r], then 

m 

R d T ) = E^ t Eexp{i(uj k (t + t) +0 k ) - i(ujit +^/)} 
k,l=\ 
m 

= °U k ) ex P( tu; fc r ) (1-62) 

k = l 

where r 6 or r 6 Z. 

Readers can easily see that (1.62) is equivalent to (1.44). It means that the 
covariance function R(k) or spectral function F ( 入 ） can only preserve the frequency 
information but the phases will be lost. 


Example 1.9 (Filtration of White Noise). Let {et^t 6 Z} be the white noise, 
then the correlation function of the filtration (see (1.28)) is 

m 

^(r) = ^ hkhr+ky t e Z. 

k=0 

Since R[k) = 0, when r > m or r < —m, so that the following inequality 


^2 l^fcll^r+fcl < +00 

t= — oo r= — m k—0 

holds and the spectral density is 

/( 入)=占 E 


2tt 


oo / m \ 

= — t — n / 


-i/iA 


M= —oo 、 fc=0 


Put fi = s — Ic, then we have 

/( 入 )=^f> £ 


k=0 




2tt 


fc=0 's=0 


!>* e —‘ 


k\ 


(1.63) 
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Some of the very often used covariance functions and spectral functions of sta¬ 
tionary processes are listed in Table 1.1. 

Table 1.1 


covariance function 


spectral density function 

1. a 2 e~ ar7 , or > 0 


e -A 2 /4a 

2y/na 

2. a 2 e~ ar2 cos Pt } a > 0 


a 2 (A + ^) 2 (a- 灼 2 


4^ |C *。十 * ‘。 1 

3. fr 2 c _a l r l cos pr } r > 0 


a 2 a I" 1 1 

2n l(A-^) J + a 3 + (A + /3 ) j + q j J 

4. >ie- a KI(l4-o|r|) 


2Aa s 

tt(A 3 + a 2 ) 2 

㈠ ,、 / 々 _ 卜1 ), 
5. ^)=| o 

M<i 

|r|> 1 

a 2 / sin(A/2) \ 2 

2tt \ A/2 ) 


§1.3 The Hilbert Space of Second Order Processes 

In this section, we shall introduce a very important functional space — the Hilbert 
space of random variables which possess the second order moments. 

First of all, we recall the definition of a Hilbert space which can be found in any 
text book of functional analysis. 

Suppose that H \s 2 l linear space which defines an inner product for any two 
elements ry E , denoted by ($, rj). The inner product ($, r/) is a complex value 


which satisfies the following conditions: 

1. (a^,r/) = 77 ), for any complex number a, (1.64) 

2. = (”,0 ， (1-65) 

3. Ui + 6，”）= Ui，”）+ (6 ，”)， for any ^ 2 ,rj e H y (1.66) 

4. $) > 0, and “= 0” if and only if $ = 0, (1.67) 


where the right hand side of (1.67) is understood as the definition of 0 element in 
H. 

According to the result of (1.67), we may introduce the norm for each element 
^ 6 H, which is defined as: 

IKII = \/U ， 0. ( 168 ) 
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Besides, is a complete linear space, i.e. for any Cauchy series fi,". , ... 

E H, ||$ n — f m || —► 0 when n,m —► + 00 , then there exists an ( € 丑 ， such that 

lim (1.69) 

n—*+oo 

In brief, we call H a Hilbert space if it is a complete linear space with an inner 
product. 

Theorem 1.4 (Hilbert Space of Second Order Variables). Suppose that 
(0, /, P) is a probability space, i/ is a set of random variables defined on (H, J, P) 
and possess second order moments, i.e. E\^\ 2 < + 00， 6 H. Define the inner 
product of r? € i/ as 

= (^, 17 ), (1-70) 

and the norm as 

旧卜 (⑶ 

then jFf is a Hilbert space. 

The prove of Theorem 1.4 can briefly be given as follows: 

1 . Linearity of H: let fi, € H’ a,/? be any two complex numbers and put 
rji = a^x, t] 2 = then the following Minkovsky inequality holds 

E\rji + rj 2 \ 2 =\E{rji + r; 2 )(r?i -f rj 2 )\ 

<\E\rji\ 2 + 芯 |ry 2 | 2 + ^(^ 1 ^ 2 ) + 五 ( 〜仍 )| 

<^|r?i| 2 + 五 |” 2 卩 + 2{E\ Vl \ 2 E\r i2 \ 2 ) 1 / 2 

=(y/E\rii\ 2 + y/E\Tj 2 \ 2 ) 2 - 

(1.71) shows that 

||ai$i + a ： 2$2|| < (I a l| • llflll + |«2| - 11611) < +00. 

2 . ($, tj) satisfies the four conditions of inner product: 

a. (a^v) = E(a^fj) = aE((fj) = 

b. (6 + 6,^7)= 芯 Ui + = E (ifj + E^ 2 fi = (6, … + (6 ， n). 

c. ($, 77 ) = Eif) = E(rj^) = (r? ， f). 

d. (e, o = ^iei 2 >o, 

where “= 0 ” holds iff $ = 0 with probability 1 . 

In the following study, we always assume that two elements of H are equivalent 
iff their difference is equal to zero with pr.l. 

3. H is 3 l complete space: let 匕， （ 2 , … be a Cauchy series in if, i.e. 


(1.71) 

(1.72) 


ll^n — Cm|| — 0, 


when n, m —► +00. 


(173) 
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Now, we want to prove that an element f 6 always exists such that lim f n = C- 

• . . n—»+oo 

Recall the Chebyshev’s inequality of random variable (see Shiryayev (1984)), we 
have 

尸 {|fn — （ m| > < — 五 |fn - fm| 2 — 0 ， n，m —► + 00 . (l-74) 

Therefore, there always exists a subseries > C (a. 5.), when n —* oo and for fixed 
m, 

frn - ^k u - ♦ $rn - C a s. when n -> oo. (1.75) 

Recall (1.73) and Fatou’s lemma (see Shiryayev (1984)) the following inequality 

E\U~i \ 2 =E{ lim in(\U-CkJ 2 ) 

n—*oo 

- 6U 2 < 6 (1-76) 

holds for any <5 > 0, when m is sufficiently large. 

Accordingly, there exists a positive integer mo > 0, such that for any m > m 。， 
五 k 一 $ m | 2 < oo, i.e. 

C - $rn 6 /f, (1.77) 

thus f + (C — fm) H. § 

We have proved that if is a Hilbert space, but it need not be so for studying 
some stochastic processes and a more convenient subspace is often sufficient. In 
fact, suppose that 匕 is a second order stochastic process, then put 

叫 Usen (1.78) 

i.e. each element f of L[^t] can be represented as 

? = E a i (JV) ^r> (i-79) 

/ =i 

where 6 {a{ N )} are complex numbers. 

Evidently, L[^t] is a subset of H. Now consider the closed form of L[{t] to be 
denoted by 

H^ = {L^ t ]} c = £U tl teT}, (1.80) 

each element of can be represented as 

N 

f = * im a i N) ^ N) ' ( i si ) 

Tv —*oo *■ — ^ 
k— 1 

where G T, {a^^} are complex numbers. Then H^ is a subspace of H. 
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In prediction theory, it is necessary to consider a still smaller subspace of 
that is for any fixed t 6 T, put 

^(0 = {L[$ a ,s < «]} c = £{$ a ,s < t}, (1.82) 

then 

H^t) CH^H. (1.83) 


§1.4 Stochastic Integral and the Isomorphic Relationship between and 
the Functional Space L 2 (dF^) 


1.4.1 Orthogonal Stochastic Measure. 

In order to define and to understand the stochastic representation of a stationary 
series, we shall introduce some basic ideas on stochastic integral in this section. The 
stochastic representation of stationary random processes play very important role 
both in theoretical studying and applications. 

Definition 1.5 (Orthogonal Stochastic Measure). Suppose that (IT, B (TI), F) 
is a measure space with ^(n) < +oo, if for any A 6 B (TI) it corresponds to a 
random variable Z(A) G H, which keeps the following equality 

= F(AnB), A,BeB(U), (1.84) 


true, where (•»•)// is the inner product in H y then Z(.) is called an orthogonal 
stochastic measure on (IT, B(n), F). 

It is not difficult to prove that the following property will keep true: for € 
B (IT), *• = 1,2, … ， A,- fl A,- = 0， * 一 then 

oo oo 

^(U Ak) = ^Zz(A k ). (1.85) 

k= 1 k = l 

For example，let n = [— 7 r, 7 r], B (II) be the Borel field of all Borel sets in n, and 
is the spectral measure of a stationary series 

F^(B)=[ 刷队 Be 8(11), 

J B 

where / 《 ( 入 ） is the spectral density function, then (II, is a measure space 

with ^(n) = E\^ t \ 2 < oo. 
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Put 


Z ( 入 ）= 5 (Afo — 鲁 ex P{ _l M }) ， ^ 6 n (1.86) 

it can be proved (see Yaglom (1962)) that the series of the right hand side of (1.86) 
in H exists, and possesses the following orthogonal increment property 

[Z{X 2 ) - Z(A!) 邓 4) - Z{\ 3 )) h = ^([A^A^n I 入 3 山 )). (1.87) 

Now, we can construct a stochastic measure in the following way: 

1. For any interval [a, 6) C n, we define the stochastic measure 

Z[a t b) = Z(b) - Z{a), Z[a,6) 6 H, (1.88) 

evidently, (1.84) will be satisfied owing to the equality (1.87). 

2. Extend the measure Z(.) to the class Q = {(Jj[afc,6fc) t [ajk,6jt) C IT, non- 

overlaping}, i.e., I e Q, I = = U? h, then define 

n n 

2{l) = Y. Z W = E i Z ^) - 6 H - ( 1 - 89 ) 

Ae= 1 k=l 

It is also easy to see that Z(I) keeps (1.84) true in Q. 

3. Extend the Z(-) to the sigma field B(S) = B (II), which will keep the equality 

(Z(A),Z(B))=F^(AnB)= f A (A) dX, A,BeB(S) (1.90) 

Jahb 

true. 

These are typical steps for constructing an orthogonal measure based on station¬ 
ary series and a spectral density /^(A). 


1.4.2 Stochastic Integral and the Representation of Stationary Processes. 

Suppose that (IT, B (IT), F) is a measure space and Z(-) is an orthogonal stochas¬ 
tic measure defined on B(II), then a stochastic integral /(•) defined in H can be 
introduced by the following way: 

First of all, we introduce a functional Hilbert space 

L 2 (dF) = ^ - J \<l>\ idF < +°°| • (1.91) 

the inner product in L 2 (dF) is defined as 

= / UdF, e L 2 (dF). (1.92) 

Jn 



23 


For the indicator function Xa(A), A E B(n), we define the stochastic integral of 
Xa(A) as 

/(xx) = [ x A {X)dz = Z(A), a e B(n). 


Then, for <t> G L[xa, 1» i e. 

n 

^ = a ‘XA‘ （ A), e B{U),Ai n Aj = <t>、i 手 j, 

i=l 

the stochastic integral of <f> is defined as 

n 

/ ⑷ =€ H. 


(1.93) 


(1.94) 


(1.95) 


Now, for any two elements £ L[xa, ]» the following equalities of inner product 

[I{4 ， ),I{rl>)) H = (J^dZ ， j^dZ) h 

= Y / c k d,(Z{A k ),Z(B,)) H 
k t l 

㈧ ，[山 xs, ㈧) f 

=f <pxp dF 
Jn 

= [小肩 P 


(1.96) 


hold. Hence, if € L[xa,], then we have 


II \ 4>dZ- \ ^(iZ|| 2 =||/(^)-/(^)|| 2 
Jn Jn 


(1.97) 


=11 彡- 

together with (1.96) they show that the image <j> —► /(0), i.e. from L[x A .\ to the set 

L[Z) = {/(^) ： ^eL[ X A.]}cH, (1.98) 

preserves the “distance” and inner product of any two elements. 

Finally, for any (p G L 2 {dF), we can select a Cauchy series <f> n G L[xa.]» ^ = 
1,2,, such that 

lim <j> n = <j> (in L 2 {dF)). (1.99) 
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By the definition (1.95), the {/(</> n )} is well defined and we have 

l|l( 彡 n) - — \\<l>n -* 多 m||ia — 0 n ， m— > 00 ， (1.100) 

by (1.97). Thus (1.100) shows that {/(<^ n )} are Cauchy series in H t so there exists 
an element f G if, such that 

lim I(<f> n ) = (1.101) 

n—*oo 

Now, we define the stochastic integral of 彡 as 

7(0) = j <t> dZ=^ (1.102) 

Jn 

and denote all of the “images” I[4>), (f> 6 L 2 (dF) as 

L\Z\ = {/(</>) : (f> e L 2 (dF)}, (1.103) 

Evidently, C[Z] is also a Hilbert sub-space of H, and preserves the following equal¬ 
ities 

剛，聊 // =(lim 7(^),lim(V» m ))i/ 

n. tti 

=lim lim(/(</> n ), /(0 m ))n 

n m 

= (lim</» n , lim ^m)L 2 

n m 

= (<^> 0)l*, G L 2 (dF). (1.104) 

II ，⑷一 nm 2 H =11 多 — L\dF). (1.105) 

Now, if we consider the stochastic integral /(•) as the “image”，then such “image” 
is from the L 2 (dF) on to the £[Z], and they are isomorphic Hilbert spaces under 
such “image". 

In fact, (1.105) show^ that if 0 ^ t/;, then I(<f>) ^ I (ip) ( in £[Zj), and thus, for 
any element ^ G £[ZJ, we have the unique “inverse image” such that 7 _1 (f) = (f>, 
and on combining with (1.104) the conclusion is clear. 

Suppose now i = 0, 土 1，. •. are stationary series, F^(X), A G IT = [—7r,7r] is its 
spectral function, then we can define a stochastic measure Z(-) (see (l.86)-(l.90)) 
which is induced by and so we have two isomorphic Hilbert spaces 

£[々 卜 L 2 (dE(), (1.106) 

then it can be proved that 

= 0, 士 1 ，".}， （ 1.107) 


holds. In fact, according to the definitions of Z(-) and /(•) ， L\Z^\ C is evident. 
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Conversely, for ^ G He, we can prove that ( € C[Z^] t and this can be concluded 
by the fact of G C[Z^]. 

Avoiding the cumbersome analytical derivation for the proof of ^ £[Z^], we 
only indicate here the main ideas (not a rigorous proof) as in the following: 

By the definition of Z^{\) in (1.86), we have 


I(e ikx ) = J e ikx dZ^X) 

=/>Oy A )) dA 

= / e，fcA ( E ^te- itx /2n) dX 

= f <士/，.(叫 




k = 0, 土 1,... 


(1.108) 


since e lkX G L 2 (dF^) thus ^ G £[Z^]. 

Combining (1.107) and (1.108), we have the following theorems. 

Theorem 1.4 (Stochastic Integral Representation of Stationary Series). 
Suppose that t = 0, 土 1，… are stationary series ， ( 入 ) is the spectral function, 
(II, B (II), F) is a measure space, where is the spectral measure determined by 
(A), then there exists an orthogonal stochastic measure such that 

= / e iXt dZ^X), t = 0,±1,... (1.109) 

Ju 

and under the “image” /(•) of stochastic integral (1.102), and L 2 (dF^) are two 
isomorphic Hilbert spaces. 


Theorem 1.4 can be generalized to the continuous parameter stationary processes. 


Theorem 1.5. Suppose that t is a stationary process, F^(A) is the spec¬ 

tral function; (i?i, B (i2i), is a measure space, then there exists an orthogonal 
stochastic measure - G such that 


6 


'Ri 


e^dZ^X), t € Ru 


( 1 . 110 ) 


and under the “image” /(•) of stochastic integral, H and L 2 (dF^) are two isomorphic 
Hilbert spaces. 
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1.4.3 Karhunen Theorem. 

This section is to introduce a theorem on the stochastic integral representation 
for the second order processes and its applications. 

In fact, the stochastic integral representation of random processes (1.109), (1.110) 
can be generalized to second order processes without stationarity (see Yaglom 
(1987)). 

Theorem 1.6 (Karhunen). Suppose that 仏 ， t 6 T} is a second order process 
(E^ t = 0), its covariance function = E^ t (a can be represented as 

R{t,s) = / f{t i X)f(s i X)dF, t、seT (1.111) 

J K 

where A 6 Si*, (A, B (A), F) is a measure space, with F(A) < +oo, then there exists 
an orthogonal measure Z(-), - E B (A), such that 

j f{t i X)dZ 1 t e T. (1.112) 

Ja 


Now, we want to show some of the applications of Karhunen theorem. 

Theorem 1.7 (Orthogonal Expansion of Stationary Series). Let t = 
0, 士 1,. •. be a stationary series, then can be represented as 

oo 

= 〉 : a k^t — ki t = 0, 土 1 ，... ， （ 1.113) 

.fc= —oo 

if and only if the spectral density /^(A) exists，where {e t } are white noise, and 

Efckfcl 2 < +°°- 


Proof. Suppose that the spectral density /^(A) exists, then (II, B (IT), i^) is a 
measure space, where IT = [—7r, 7r], 

= [ A (A) dX, WBeB(U), 

J B 

since 


=R{t - s) 


[ e i{t - a)x 
Jn 


A (A) d\ 



*Bi denotes the Borel field of Ri. 


(1.114) 
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then by the Karhunen theorem there exists an orthogonal measure Z(-), - G B (IT) 
such that 

6 = / e ixt ^/U[X)dZ. (1.115) 

Jn 

On the other hand ， \J /^(A) 6 L 2 (<iA), so it can be expanded as a Fourier series 


va ( a ) = Y2 a k 

k== — oo 

thus (1.115) can be rewritten as 

I ： ^(i 


-ikX 


y/2n 




( L 2 ㈣ )， 


(1.116) 


-ikX 


V2n 


dZ 


= ^2 o>k^t-k> 
k = — oo 

where e t = f n e tXt dZ, which is a white noise series. In fact, 
Ee t e s 


(1.117) 


(\/27r in 


e iXt dZ, 




\/27r Jn 


e iXa dZ 


,e x 


At 


2tt 


、办 J L , 

r dX = 6 t> . 


(1.118) 


and Eet = 0 is evident since we always assume the mean of stationary series is 
equal to 0. That proves the sufficiency of the theorem. 

Now, if can be represented as (1.113)，where e t is the white noise, then there 
exists a stochastic measure dZ e such that 




e itx dZ e 


(1.119) 


and 

Hence, we have 


A(A)=—, 入 en. 


6 = ^ o. k e t -k 

k= — oo 

= ak f dZe 


/ n (£ 




-ikX \ jtx 


e ltA dZ e , 


( 1 . 120 ) 
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since {ajt} are square summable. 
The covariance function of is 

R d T ) =Ut+r ， 6)h 

oo 

-* VAI E 


2tt 


n 




-ik\ 


k= 


d\, r = 0, 土 1, 士 2, 


Then according to the Theorem 1.3, we know 


( 1 . 121 ) 


hW 


2n 


E 


a fc e 


a e n. 


( 1 . 122 ) 


That completes the proof. 


§1.5 Strong Law of Large Numbers for Stationary Series 

In time series analysis, the starting point for estimating the first or second order 
moments of the series, frequently is from the observation sample xi,X 2 ,... , x nj 
i.e. a finite segment of the realization of the process. So the estimating theory is 
quite different from the canonical statistical method since we have now only one 
observation series and sometimes there is no possibility for repeating the observation 
in practice. 

In this section, we shall introduce some very important results on the strong law 
of large numbers in time series analysis for estimating the first and second order 
moments which are very often used both in theoretical analysis and applications. 

Theorem 1.8 (Strong Law of Large Numbers in FFT). Suppose that a is 
a stationary series with zero mean Ex t = 0, R(k) the covariance function of x tj if 
for sufficiently large n, 

dj 1 -》•)，= 0 (*) (1.123) 

holds for A G II uniformly, where a > 0, then 

1 n 

lim - = 0, a.s* (1.124) 

n—*-4-oo Tl ^ J 



*a.s. means that, with probability 1, for sufficiently large n, we have iy exp(-ijX )= 

o(n), A 6 II. 
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The complete proof of this theorem is rather cumbersome, the reader can find it 
in the book of J. L. Doob (1953). 

By using the Theorem 1.8, we can obtain some very useful theorems and conclu¬ 
sions in the sequel. 

Corollary 1. Suppose that the covariance function R(k) is summable 

oo 

[ |及（左）| < +0O, 

/c = —OO 

then the conclusion of Theorem 1.8 keeps true. 

Proof. Since 


If d) 耶 >_ 叫 4 f ： 

j =1 —n j = —oo 

where K is constant and a = l，.so (1.124) holds. | 

Corollary 2. Let xt be a stationary series, Ext = a, the covariance function R(k) 
satisfies the condition of 


do-$) 耶 )=o( 士 )’ «>° ( ii25 ) 

then 

1 n 

lim — Xj = Ex t — a, a.s.. (1.126) 

j = i 


Proof. Put ij = Xj — a, ) = 0 ,土 1 ，士 2,...，then Xj and Xj have the same 
covariance function R(k). Hence ，（ 1.125) induces the equality 




o, 


a.5., 


(1.127) 


by Theorem 1.8. | 

Theorem 1.9 (Strong Law of Large Numbers for Covariance Function). 
Suppose that Xt is a stationary series with zero mean and X n = x n+v x n is also 
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stationary for any fixed v. If there exist two constants K and a > 0 such that for 
sufficiently large n 

- E (1-¥)咐„ + ,又„)-|札(《0| 2 = 0(占) (1.128) 

j=l—n 、 ’ 、 ’ 

holds, then 

1 n 

Y\m^ - 二 x n+v Xj = R z [v), a.s., (1.129) 

n ^°° n y=i 

where R x (^) is the covariance function of %t. 


Proof. For any fixed v, since x t is a stationary series, so 


E X n = Ex n ^. v x n — 

: Rx[v) 

(1.130) 

does not depend on n, and 



Vn~^n 一 ^-x (^) j fl = 

0, 土 1,±2,... 

(1.131) 


to be a stationary series with mean zero. The covariance function is 


RyU) = E(X n+ jX n ) - |iZx(v)| 2 . 
The condition (1.125) in Corollary 2 now is 



= ~ ^ f 1 _ [E(X n+ jX n ) - |i? x (v)| 2 ] 
y = 1 — n \ ’ 

= ~ (i - g ) 芯 d+yXn) - |i2:(V)| 2 

y=1—n 、 ’ 

< —, (a>0). (1.132) 


The equality of the left hand side of (1.132) is based on the fact 



(1.133) 


Now, according to the result of Corollary 2 for the series y t , we have 


lim 

t—♦4-00 




lim - Y][ x n - i?z(v)) = 0, 

*+oo n 


(1.134) 



i.e. 
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lim — x n+v -x n = R x (v), a. 5 ., (1.135) 

n—»+oo n 

y=i 

and that ends the proof. | 

Corollary. Let xt be a real stationary Gaussian series with Ext = 0, its covariance 
function R x [k) keeps the following inequality 

凡 ㈨ I < 磊 (1-136) 

true for sufficiently large A:, where /C, a > 0 are constants. Then for any fixed v, 

1 n 

lim — ^ XjX J+v = R x (v) t a.s.. (1.137) 

j'=i 


Proof. Put 


yn = X n 一 R x (l>), 71 = 0，士1，士2，. 


where X n is defined as in Theorem 1.9, then 


R y {n) = EX n+s X-\R x iy )\ 2 


Since xt is Gaussian, it is not difficult to prove that 


and 


芯 ^n + 3 乂 s = 芯 ( 工 w + n + s 工 n + s 工 v + ah) 

= ^l(v) + i^(n) + R x (n + v)R x [n — u), 
i?y(n) = R^(n) -|- R x [n + v)_R z (n — i>). 


So we have 


l^yWI < 


K 2 




+ 


K 


K 


n 2a (n 十 v) a (n — v) c 


K 2 


ci 


+ 


n 2or (Ti 2 — v 2 ) a y 

ci > 0, 


(1.138) 

(1.139) 


(1.140) 
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and the condition (1.123) of Theorem 1.8 can be checked as in the following 


|5 n |= 


巧二 d 灿>- 

,7 = 1 —n • 


< 


2 




i=0 


E i^0')i 5 




E I«y0')l 2 + E \ R yOW 


3=0 


<2, 


它 |i? y 0.)|2 + i £ |iJ y (y)|2 ， (1.141) 


where m is a positive integer, when n > m the equality (1.136) holds. 
Now, if ol> 1, then 


if 0 < a < 1, then 


1^1 — 


l^n| < 2, 


<2, 


<2, 


co 1 ci 2 

- + -Z.7^^ ^T72 C2 


+ - 4 E 

n a ^ 


co 

n n 


K 


n( l—a )j 2a 


n n Q ^ •> 


K 


(l+a) 


C 0 1 

7T + ^ 


E 


K 


- 2 V? + ^- n-/2- 


y(l+a) 

c 4 


By the Corollary 2 of Theorem 1.8, we have 
1 n l n 

n V ^o n^^ = n V ^on Eh &㈦ )= 0, 


(1.142) 


(1.143) 


a.s.. 
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In the case of Gaussian stationary series, we can have stronger results on the 
covariance estimation, which will be listed as follows with the proof omitted. 

1. When the covariance function R(k) of a Gaussian stationary series x\ satisfies 
the condition 

1 n 

lim - ^2 = °» (1.144) 

n— »oo n 

；=1 

then (1.129) keeps true. 

2. For Gaussian stationary series, when the spectral function F Z (A) is continuous 
on [—7r,7r], then (1.129) still keeps true. 

When Ylk l-^x(^)| < +oo, we know that the spectral density function / Z (A) exists, 
thus F Z (A) will be continuous on [ —7r,7r] and (1.129) still holds. 

All of the results mentioned above are very important in time series analysis. 
Particularly they show that under some general conditions, we can have 

r 1 n 

lim— 〉 Xfc(cj) = a.5. 

n n 乙一 ^ 

< ^： 1 _ (1.145) 

lim - x k ^. T (u)x k (u) = E^t +r (u)^ t (uj). a.s., 

k= 1 

where $£(cj) is the stationary series, and = 1,2 ,...} is a realization of 

or an observed sample series in practice. 

The formulas iri (1.145) show that “the time averaging for sample series is equal 
to the statistical mean”. Apparently, these results strengthen the applicability of 
the time series theory and methods in practice. Since in many practical problems 
the researcher can only obtain a single observation record and are awfully difficult 
to have a repeated observation. 

For example, in economic area, one wants to make a 5-years forecast on the 
unemployment problem, but the Statistical Annals can only offer the practical data 
from 1951-1990 as xi, ••- 、工 n 、 evidently, such record could not be repeated. 
Similar situations can also be found in astronomy, geology etc. 

Fortunately, some of the theoretical results and methods in time series analysis 
are based on the second order moments of the series, so the results of (1.145) become 
extremely important. 


§1.6 Sampling Theorem for Stochastic Stationary Processes 

In the preceding section, we particularly emphasize the theory of stationary series. 
Since, on the one hand, many many practical observations are originally recorded in 
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“discrete” way, e.g. the unemployment data (in each month) is recorded as a series; 
on the other hand, the developments of computer techniques forces the researchers 
to process their data in a digital way, i.e. to digitize their record from a continuous 
curve to digital data, e.g. by the A-D convertor or digital tablet for the computer 
input. 

However, when people want to digitize their continuous record into digital data, 
a correct sampling way for preserving the original information as much as possible 
is necessary. Otherwise it can be shown that the analyzing results based on a wrong 
sampling method could not present the true picture of the original process. 

The following sampling theorem aroused originally in electrical engineering for 
deterministic functions, people now usually call it as the Shannon’s sampling theo¬ 
rem. 


Theorem 1.10 (Sampling Theorem). Suppose that is a stationary 

process, its spectral function ^(A) satisfies the following condition 


^(A) = 0, 


(1.146) 


J\X\>2irW 

for some positive number W, then the following series representation for the process 

n 、 sii^ZTrVV^ _ n7r) 




t G Ri 


2W / 2ixWt - n7r ， 

n= —oo 、 ’ 

holds, where the convergence of the series is understood to be in H. 


(1.147) 


Proof. By the stochastic integral representation of stationary process, there 
exists an orthogonal measure Z(-), such that 


汴） 


'Ri 


•入亡 dZt G R \) 


and (1.146) shows that we can rewrite (1.148) as 

r2%W 

m = / 


t ^ R \, 


(1.148) 


(1.149) 


since for any Bore 1 set A E B\ and A C {(— oo, —27rVV^) U (2ttW, oo)}, we have 


e ,At dZc 


XA^dZi 


f Ri 


\ XA e iXt \ 2 dF^ = / dF t = 0. (1.150) 


Now, for fixed t, e tXt can be expanded as in the following Fourier series on 
[-27rVV,27rW] 

oo 

e iXt - C n (t)e in ^, (1.151) 



where 
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w) d\ 


sin(27rWi — nn) 
27rW^ — ri 丌 


(1.152) 


Since e lXt is a continuous function of A with bounded variation on [ — 2-kW^ 27tW] 1 
then the partial sum of the Fourier series 心（入 ） is uniformly bounded for any n 
and converges to e lXt on (—27 tk;, 27ru;) ( see T. Kawata (1974)). By the well known 
Lebesgue convergence theorem, we have 



(1.153) 


(1.154) 


i.e. 

训= E 喘) sin 2 (:—:) ， (—) 

n = —oo 

that ends the proof of the theorem. | 

In practical application，more frequently it is to assume the spectral density /(A) 
satisfying 

/(A) = 0, |A| > 2nW, (1.156) 

rather than (1.146), and there is no necessity to understand that of 

/(A) ^0, |A| < 2nW. (1.157) 

This means that W is only an “upper bound” and not the exactly truncated fre¬ 
quency of the process. Many practical problems are very difficult to know the ex¬ 
actly truncated frequency, so the value W is only an estimated value which ensures 
the condition (1.146) being fulfilled. Of course, over estimate of the value W will 
increase the number of sample point as well as increasing the computing 

time in analysis. 

The representation (1.155) shows that for a general class of stochastic stationary 
processes with finite bandwith, i.e. they satisfy the condition (1.146), any one of 
such process is uniquely determined by the discrete sampling series ( n = C( 2 ^)» 
n = 0, 士 1， ... without loss of any information. 
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CHAPTER 2 

ARM A Model and Model Fitting 


In this chapter, we want to introduce a class of stationary series which has been 
used in different fields of science, engineering, economics, medicine, etc., we gen¬ 
erally call this class as ARM A model. The ARM A model is not only very much 
interested by the researchers in application fields, but also very much interested by 
the theoretical researchers since, ARMA model involves very deep mathematical 
ideas, e.g. Markovian extension theory, rational approximation problem, etc. 

§2.1 ARMA Model and the Wold Decomposition 

Definition 2.1 (ARMA Model). Suppose that ^ = 0, 土 1 ， ... ， E^t = 0 is 
a real valued stationary series, which satisfies the following stochastic difference 
equation: 


it + + 

••• + 4> P it-p = 

沒 oh 十 + • • • + OqEt — qi 

( 2 . 1 ) 

where the polynomials 




$( 2 )= 

p 

<t>kz k 7 ^ o, 

k=0 

( 0 o = 1 ), \z\ < 1 - 

( 2 . 2 ) 

0 ( 2 )= 

: 0 k z k 7 ^ 0 , 

(^0 > 0 ), |- 2 ：| < 1 . 

(2.3) 


k=0 


are with real coefficients, no common factors and et is the real standard white noise, 
i.e., Eet = 0, Eet £ s = thenis called an ARMA(p, q) model or ARMA series. 

Let C/ be a backward shift operator, i.e. 


u k ^t = tt-k 


(2.4) 



then (2.1) can be briefly rewritten as 
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—= (2.5) 

Definition 2.2 (AR Model). In (2.1), if 心 =• * • = = 0, then is called an 

AR(p) model. 

Definition 2.3 (MA Model). In (2.1), if <t>\ = •• • = <^ p = 0, then ^ is called an 
MA(g) model. 

Example 1. Suppose that satisfies the difference equation of 

it - = 2e t (2.6) 

where et is the standard white noise. Then (t is an AR(l) model, where the root 
of the polynomial 

= 1-^2 

is z = 2 > 1 and 0o = 2 > 0. 

Example 2. Let (t be a stationary series, which satisfies 

(t = — 5^i_i -f £t~2 (2.7) 

then is an MA(g) model. The roots of the polynomial 

0(z) = 6 - 52 + = (3 - 2 )(2 - z) 

are zi = 2, ^2 = 3 and 0o = 6 > 0. 

Example 3. ft is a stationary series which satisfies 

it 一 一 i - 5et-\ 4 - €t-2 (2.8) 

then is an ARMA(1,2) model. 

Theorem 2.1 (Stochastic Integral Representation of ARMA Model). Let 
be the ARMA(p, q) model, the difference equation is 

= ㊀ (")Q ， t = 0, 士 1 ， ±2, • • • ， (2.9) 

then ft can be represented as the following stochastic integral 

6 = / n «ez. ( 2 .io) 
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where dZ^(X) is the orthogonal stochastic measure of 

Proof. First, we want to show that the integral of (2.10) is a stationary series. 
In fact, the integral is well defined since the integrand of (2.10) is bounded, i.e. 

|e _lA ^-| < M < +oo; 

where 0 a = 0(e _tA ), = $(e _tA ). Furthermore, we check the time invariant 

shifting of the covariance function as follows: 

R(t-\-T,r) =E^t+rJ t = (Ct+r, 6 )h 

=([ e ,(t+r)A ^-dZ e , / e^^dZ,] 

\Ju Jn ) H 

=f e .'« + OA^A g .' ( A©A\ 


少 A ’ $； 




dF c (X). 


We know that (see Chapter 1 (1.49)) for white noise q ， 


dF e [\)= —— d\ 
27T 


so, we can rewrite (2.11) as 


which also shows by Wiener-Khinchin formula that the spectral density of exists 

/( 入 ) = ， Aen . (213) 

Now, we shall show that ( t determined by (2.10) satisfies the ARMA equation 
(2.9). In fact, it only needs to replace the 心 of (2.9) by the stochastic integral (2.10) 






)n=o^- ikx 


: QkSi - 


(2.14) 
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and the last equality shows that (2.10) is a stationary solution of equation (2.9). ■ 

Moreover, we want to point out that* (2.10) is the unique integral representation 
of the solution of (2.9). The proof will be omitted here, since the rigorous proof 
needs some results on measure theory. 

Corollary. Let ^ be an ARMA model, its equation is 

^>(U)^ t = Q(U)€ t 


then the spectral density is 


A (A) 


2tt 


0A 


a 6 n. 


Theorem 2.2 (Wold Decomposition of ARMA Model). Suppose the ARMA 
series satisfying the equation 

^(U)i t = Q(U)e t 

then can be uniquely represented by the following series: 

oo 

= 〉: c k^t — ki (2.15) 

k=0 

where {cjt, A: > 0} are coefficients of the Taylor series of 

F( z ) = Zl ! = Ckzk ^ \ z \ — (2.16) 

^ ^ k=o 

with the condition that 

oo 

E |cjt| < +oo. 
k=0 


Proof. According to the definition of ARMA model, we can rewrite the polyno¬ 
mials $(z), ㊀ ( 2 ) as 

p 

伞 ( 2 ) = n( z - 久 ) ， Ift'l > h j = 1 ， 2, … ， p ， 

>=i 

Q 

© ⑷ =〜 n( z - o ：/), > 1， / = 1，2，...， g , (2.17) 
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hence r(z) is analytic in \z\ < p, p > 1. 

So the Taylor series 

oo 

r u) = Y^, ckzk 

k=0 

is absolutely convergent in \z\ < 1, and 

oo 

E |cfc| < +oo. 

k—0 

Furthermore, by the Theorem 2.1 we have 
6 = 

= f e ix ^~ k) dZ c = 

k=0 k=0 

and the uniqueness of the coefficients is seen from the expansion of Taylor series, 
which is uniquely determined by $( 2 ) and 0(z). | 

In the sequel, the series representation of (2.15) of ARMA models will be called 
as the “Wold decomposition” of and {c^, A: > 0} the Wold coefficients. 

Under the conditions of Theorem 2.2，the reader can easily obtain the following 
expression 

(2.18a) 

where 二二 0 < + 00 , {dk} being the coefficients of the Taylor series of 


00 

£ t = 

k=0 





in \z\ < 1. 

Hence, (2.18a) can be rewritten as a stochastic integral 


= 


/，k d 祕) 



r- x (e- ,A )^(A). 


(2.18b) 



§2.2 Orthogonal Basis in Hilbert Space 
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The Wold decomposition of ARM A models (2.15) is extremely useful in time 
series analysis, especially in forecasting and filtering theory and methods. The 
Wold series (2.15) possesses very important properties which will be illustrated as 
follows: 


Theorem 2.3 (Orthogonal Basis in of ARMA Models). Suppose that ^ 
is an ARMA(p, g) model, its equation is 

^>(U)^ t = Q(U)e t 


and the Wold decomposition is 


= 2Z c k e t - k 

k=0 


then we have: 

1U H^(t) = L{^ 3 ,s < t}. 

2. £t 6 H^(t) © H^(t — l)* 

3. {et} is a complete and normalized orthogonal basis of H 

4. c 0 > 0. 

Proof. 

1. Since for ARMA model, we have 




k=0 


so we have q 

2. For any A: > 0, by the Wold decomposition (2.15) we have 


Ut 


t» £ t) H = ^ 〉二 C/gf —fc —f ， g <) 
oo 

1=0 

=0, for any A; > 0, 


(2.19) 

( 2 . 20 ) 


( 2 . 21 ) 


since t — (/c + /) < t. 


*It means that et G ⑷ , 丄 *~ !)• 
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(2.21) shows that q 丄 ^t-k, V/: > 0 and so we have 


£t 丄 一 1). (2.22) 

3. The normalized and orthogonal properties of are evident, we only need 
now to prove that it is a complete basis of H^. In fact, by the definition of we 
have G H e = L{et)\ conversely, by (2.18)，we have H e C then 

H c = H e (2.23) 

is true. 

Now, suppose that f G and 

= 0, for any t, (2.24) 

then we can represent f as 

n 

^ = n 1 ™ 0 L c i n)e ti">> ( 2 . 25 ) 

fc =0 

and 

n 

= 0 (2.26) 

fc=o 

by (2.24). 

(2.26) means that ||^|| = 0, or ( = 0 (in /f^), that is to say, there is no possibility 
to find a non-zero element in which is orthogonal to the basis {t t }, hence {e t } 
is complete in H(. 

4. By the definition of {c/t} (see (2.15)) and ARMA model, we know that 

c o = ^^(0) = = 0o > 0. (2.27) 

That ends the proof of the theorem. | 

Corollary. Under the conditions of Theorem 2.3, the Wold coefficients 

c k = (6,^-Jk), k>0 (2.28) 

are co-ordinates of in under the orthogonal basis {q}. 

Proof. By the Wold decomposition 


6 = ^ c k e t -k, 

k=0 



we can substitute it into the following inner product and obtain 
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= c/, / > 0 . I 

k=0 


Compare the two decompositions (2.15) and (1.113), one can easily find that 

00 00 

dkUt-k (2.29) 

k=0 k= — oo 

holds. However, the essential difference of these two series is that the series of {u/t} 
does not in general possess the first and second properties of Theorem 2.3. 

Now, if we consider (2.29) as the input-output relationship between the two linear 
systems, the Wold series shows that for obtaining the output 心 ， we only need the 
past series 

^t-u ^t-2» • - - j 

and for the second series in (2.29), we not only need the past series 

1 1»2* • • • » 

but also the future series 

w t+l ， u t+2, …. 

That means the input-output relationship of the first system is “causal”，and the 
second one is “non-causal” • These results are very useful and very important in 
practical applications. 

The following theorem reveals further that in ARM A model is an innovation 
series. 


Theorem 2.4a (Innovation Series). Suppose that is an ARMA series, its 
equation is 

then _ 

£ = € t - 6 - 1,1 < = 0 ，± 1 ,士 2 ，...， ( 2 . 30 ) 

where 

6 - 1,1 = Proj ⑹' (2.31) 

»e («-i) 

H ( (t - 1) = £{f 3 ,s < t - 1}. (2.32) 



44 


Proof. By the Wold decomposition (2.15), we can rewrite ^ as 

oo 

6 = C 0 £： t © c k^t-k, 


where the symbol “©” denotes the orthogonal summation. 

Hence, we have 

oo 

Proj (6) = y^c k e t - k . 

/M 卜 i) 

Since the second part of (2.33)€ H^(t — 1) and 丄 H^(t — l), i.e. 


so we have 


and 


coe t = - Proj ⑹， 

H f (t — 1) 


co = ||6- H Proj i)( e e )ll>0, 






1,1 


Co 


that ends the proof. 


(2.33) 


(2-34) 

(2.35) 


The equality (2.35) shows that the Wold series of an ARMA model is an inno¬ 
vation series, i.e. et is the residual component of which could not be predicted 
exactly by using the past data {$ a ,5 < t — 1}. 

Furthermore, from (2.34), it shows that the first coefficient cq in Wold series is the 
u One-step ahead Prediction Error”(O.P.E.) in H 乞 ， together with the fact c 0 > 0, 
that means there is no possibility to make a prediction for ARMA model with “zero 
error” under the norm (2.34). 


Definition 2.4 (Non-singularity). A stationary series is called non-singular 
series, if the O.P.E. cq > 0; a singular series, if otherwise. 


The following theorem shows that for a non-singular stationary series, (not nec¬ 
essary an ARMA model), (2.35) in general defines an orthogonal series. 

Theorem 2.4b (General Innovation Series). Suppose that 匕 is a non-singular 
stationary series, put 

H^(t) = < t}, t = 0 ,士 1, 士 2 ,, 

•In the sequel, we denote the projection of an element f to a sub-Hilbert space Z as Proj(f). 



and define Et series as 
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e = 心 -孓 -i，i 

' 116 - 6 -i,i|| 

then {^t} is an orthogonal series. 


t = 0, 土 1, 士 2,..., 


Proof. It only needs to prove that for any integers t ^ s y Eete a = 0, since 
Ee t = 0 is evident by E^t = 0. 

Without loss of generality, we may assume s < t y and put n = i — s. According 
to the definition of e ti we know that 


and 

Hence, 


q 丄丑 《(卜 1) 

£：t 丄 H 乏 (t — k) C H^(t — 1 ), k > 


Q 丄 ^t-n — c 0 ~ $t-n — 1,1) 

on account of 丄 H^(t — n), and et 丄 H^(t — n — 1). Thus Eet£ a = 0. ■ 


The following theorem shows a recursive procedure for getting the Wold coeffi¬ 
cients {cfc, k > 0} which is much preferable to the Taylor expansion of r( 2 ) (see 
(2.16)). 


Theorem 2.5 (Recursive Procedure for Wold Coefficients). Let the equation 
of the ARMA model ( t be 


p <7 

<i>kit-k = 

k=0 /e=0 

then the coefficients of the Wold series can be obtained from parameters of the 
ARMA model by the following procedure: 

co =0o 

Ci =0i — <t>\Co 


Ck =0k — 4>\ c k-\ — <i>2Ck-2 - 4>kC0 (2.36) 


where we assume that 

Ok = 0 , \f k > q\ 

(p k = 0, if /c > p. (2.37) 
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Proof. By the definition of {ck^k > 0}, the following equality 


holds. (2.38) deduces 


Yl° kzk ' W- 


Y, Qkzk = Y,Jl ck，t> ^ k+, - 

k=0 k=0s=0 


Put l = k -j- s t and interchange the order of the summation of (2.39), we have 


OO OO / l 

E^ fc = E(E 

k=0 1=0 \fc=0 


Ck4>l-k )« • 


Compare the coefficients in both sides of the equality (2.40), then 

l 

^1 — ^2, c ^ < l , l-k, / = 0,1,2,... (2.41) 

fc=o 

are true and equivalent to (2.36) with <f> Q = l. | 

Example 4. Let be the ARMA(1,2) model as 

ft 一 T^t-i = — Set-i + £t- 2 • 

4 

Now, we want to calculate the coefficients Co, Ci, C2, C3. 

The parameters of the model are 4>q = 1, <j>i = 0o = 6 f = —5, O 2 = 1, 
then, by the recursive formula (2.36), we have 

co =0q = 6 

ci = 0 \ - c 0 <t>i = -5 - 6 ( 一 S )= ― 

c 2 = 沒 2 — c l^l ~ c o 4>2 = 1 ~ (一 $) 4^ = 8 

C3 = 设 3 一 C2<^1 = —• 


Example 5. Suppose that 心 is ARMA(l, 1) model as 


- 


—e* — —er#_ 
3 3 
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we want to obtain the Wold decomposition of ft. 

The parameters of the model are 4>o = 1, 设 o = §，沒 1 = — |， then we 

have 

A 2 
Co =00 = - 

1 

Cl =01 — CQ<t>\ = -^2 
c 2 =9 2 — Ci(j>i = 


and in general, we have 


3 n 

Hence, we have the Wold decomposition of ft 


6 




E 


3 fc+i 


^t-k- 


§2.3 The Covariance Function of ARMA Model and Yule-Walker Equa¬ 
tion 

The following theorem shows that the covariance function (C.F.) of an ARMA 
model satisfies a difference equation. 

Theorem 2.6 (Difference Equation of C.F. for ARMA Model). Let be 
an ARMA model as 

then the covariance function R(n) of (t satisfies the following difference equation: 


<t>kR{n - fc) = | 

\ ^2 ° k Cfc - n ，0 $ n $ g， 

| k = n 

(2 42) 

k=0 

{ 0, ri> q. 


where {c；t，/c 2 0} are Wold coefficients of 


Proof. For the ARMA equation 

p 

k=0 

q 

-k = ^2 ^^n+l-k 
k=0 

(2.43) 
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we multiply each side of (2.43) by and operate with its expectation, i.e. 

P 9 

Y2 0kE(e n+ i-k€l). 

k=0 k=0 


So, we have 


k=o 、 u ， 


^k=n ^k^k- 


o < ^ < 

n> q, 


since by (2.28), for 0 < /c < g, the following equalities 


(e n+ /_it, 6) = (e ： /-(Jc-n), 6)= 


^k—m 

0 , 


/c — n > 0, 

n> q. 


(2-44) 


hold. I 

Corollary 1 (Truncated Property of C.F. of MA Model). Suppose that 心 
is an MA(^) model, then its covariance function i?(n) can be represented by 


R{n )= 


Zl= n SkO k - n , 0 < n < 9 , 
0, n > q. 


(2.45) 


Proof. We can consider the MA(^) model as an ARMA(0, q) model, and by the 
Theorem 2.6 we have 


<7 

i2(n) = ^2 G kCk-n, 0 <n < q, 

k = n 

= 0’ n > q. (2.46) 

Now, by the recursive formula (2,36) with 〜= 0, A; > 0, we have Ck = o < 
k < q. Ok instead of Ck in (2.46), the result of this theorem is evident. ■ 

The formula (2.45) is very important since it shows that all of the C.F. of MA ( 分 ) 
model are truncated at a lag n = q, i.e. 

R(n) =0 ， n > q. (2.47) 

In fact, it can be proved further, that the condition (2.47) is also a sufficient 
condition for & to be an MA ⑷ model, if its spectral density /(A) ^ 0, A 6 II. 
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Corollary 2 (Yule-Walker Equation). Let be an AR(p) model, its equation 

is 

p 

^ 4 >kit-k = 0 o e ti 

k=0 

then the C.F. R(n) of satisfies the following Yule-Walker Equation 


( Rq 

Ri 

R2 

... Rp \ 


( 1 \ 




Rq 

Ri 

• • • Rp— i 



— 

0 

U 

Rp-i 

Rp-2 

... Rq ) 






p 

R{n) = - ^ <PkR(n - k), n > 1, 

k=l 

where = R(k), A: = 0, 1 ， 2,. •. ， p. 


(2-48) 


(2.49) 


Proof. Consider the AR(p) as an ARMA(p ， 0) model, then by Theorem 2.6 we 
have 


“ 尺 ( n -= 衫， 

k = 0 
P 

Y] <PkR(n - k) =0 y 


n = 0, 


n > 0. 


(2.50) 


fc=o 


Now, rewriting (2.50) into a matrix form, we can obtain the results as (2.48), and 
(2.49) as the second equation of (2.50). | 

Remark 1. In some books, the following equations are called Yule-Walker equa¬ 
tion: 


(2.51) 

(2.52) 

(2.53) 


/ ^0 

R\ 

1 \ 



fRi\ 

Ri 

Ro 

• • -^p—2 

4 P) 

= 


^ Rp-i 

J?p_2 • 

Rq J 



\R P ) 


where Rk = R(k) and 


In fact，if we put 




4>i p) 


~ ( Pkl ^ ~ 1,2,... y 
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then (2.48) can be resolved into two parts which are just the same as that of (2.51) 
and (2.52). 

Put 



then the Yule-Walker equation can be rewritten as 


r p ^p = 7pi 


(2.55) 


rp+i (4 P ) = (o)' ( 2 . 56 ) 

the symmetric matrix T p or r p+ i is the Toeplitz matrix. 

Remark 2. For a non-singular stationary series 心 ， if its covariance function 
satisfies the following difference equation, 

p 

^ - k) = 0, 1 < n, 

k=0 

where {<^i, 彡 2 ,…， are real numbers and 

p 

电 ( 名 ) = <t>kz k ^ o> \z\ < 1 , (<po = 1 ), 

k=0 

then must be an AR(p) model. 

In fact, put 

Vt 

then for n > 0, 

p 

Erj t - n rjt = 

3=0 

i.e. rj t is a white noise. Moreover, since is non-singular, so 

p 

£kt | 2 = E\^ t + Y, <f>k^t-k \ 2 = el>o. 

k=l 


' 6jR^(n- j H- s) ) =0, 




Now, let et = Vt/^Oi we have 
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^ 4>ktt-k 、 

/c=0 

which shows is an AR(p) model. 

The Yule-Walker equation plays very important role in time series analysis. The 
following theorem gives a simple recursive algorithm for solving the Yule-Walker 
equation (2.55) based on the property of Toeplitz matrix. 

Theorem 2.7 (Levinson Recursive Algorithm). Let 匕 be a non-singular sta¬ 
tionary series, and keep the notations r p , as in (2.54)，then the solution rp p 

of the Yule-Walker equation 

r p 0 p = 1 P (2.57) 

can be obtained by the following recursive algorithm for k = 1,2,.... 

<6 ( !° = 丑（1)/丑(0)， 

4+ + i ° = +1)- 亡耶 + 1 - W )) (刪-亡丑 o .)#) ， 

4> ( - +x) = ^ k) - 4>i k + \ l) <t>i k l J+l , i<j<K. 

(2.58) 

Proof. Put 

a T k =(R(k) . R(l)), 

b?=( 必 H 

4=(#) ， (2.59) 



First of all, we want to show that the inverse matrix r~ 1 always exists. Suppose it 
does not, one can find a vector v such that 



2 

v yG-y+i 





(2.60) 
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and without loss of generality, we may assume that Vi ^ 0, then rewrite (2.60) as 

k 

v r r fc v = - y^/?y$A ： -y-nl| 2 = 0 ， (2.61) 

y=2 

where fij = 一 Vj, j = 2, 3, • •. , k. 

The result of (2.61) would show that f/t could be a “no-error” prediction by the 
past data ... , $ 2 , $i), and the series ft is singular, according to Definition 

2.4, which contradicts to the condition in this theorem, hence the inverse of Tk 
exists. 

Secondly, the following equalities 

T -i =T 

iTfcT =r fc , 

Tafc =otk- (2.62) 

Then the Yule-Walker equation can be represented as 

Ffc 十 iWh = (2.63) 


i? a (0)) (4*^) = + 1))- _ 

By the block operation of matrix on (2.64), we have 

.{ r fc Cfc +1 +&C = a*, 

1 訂 C* +1+ U 聯 ) = fi(i + l) • 

Using the relationship of (2.62), the first equality of (2.65) will deduce the fol¬ 
lowing results: 

ca+1 =0* - akC)) 

=b* - (2.66) 

The last equality of (2.66) is obtained by 


since r 厂 1 = 


r*'Ta t = Trj'ajt = Tb*, 


(2.67) 
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Now, (2.66) instead of the c/c+i in the second equation of (2.65), we can obtain 

耶 + 1) 二^ 1 )/?⑼ + a ^ b fc - 

=< l > ^k+i\^W ~ a k^k) + a^bfc, 

thus 

<f>k+! l) = + 1)- 5 【 b*) ( 及 ( 0 ) — ajb fc ) _1 

holds. 

Combine the results of (2.69) and (2.66), one sees at once that they are the matrix 
forms of the equations of (2.58). | 

The recursive algorithm (2.58) for solving the Yule-Walker equation (2.57) is 
usually called the Levinson algorithm, which will greatly reduce the computing 
time in comparison with the general algorithms for solving the linear equations 
such as the Gaussian method, etc.. 


( 2 . 68 ) 

(2.69) 


§2.4 Model Fitting Under the Criterion of One-step Ahead Prediction 
Error 

In this section, we will introduce the theory and methods on how to fit a model 
for practical observation data. However people always hope that the fitted model for 
the data will be “as good as possible” for forecasting, filtering or spectral analysis. 
Up to now, there are several methods for constructing time series models, some 
of them are good for some problems but others are not. So, in each particular 
case, how to select a good modelling is sometimes a trouble and not easy to know 
in advance. A simple principle perhaps is to check the models with the practical 
data in some way and select the best one. Sometimes, a simple model fitting 
may show better performance than other complicated modelling in some practical 
problems. In this section, we shall introduce a simple method for model fitting, 
which has been widely used in science and engineering areas and often shows very 
good performance in forecasting and spectral analysis. This method is usually called 
“Maximum Entropy” （ M.E.) method. 

The basic problem in M.E. can be stated as follows: 

Suppose that we can only obtain p + 1 values 72(0), i?(l) ， … ， R(p) of the covari¬ 
ance function R(k) of a stationary series 心 ， then how to find a stationary series Tj tl 
such that its covariance function R v (k) satisfies 

RnW = R ( k )^ 0 < ^ < P, (2.70) 

and also possesses certain optimality. 

First of all, theoretically we want to know does such rjt always exist? Is it unique? 
Secondary, if such rj t is not unique then how to select the “optimum” one? 
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It is apparent that the optimality must have relevance to some kind of criterion. 
In the following we shall use the One-step ahead Prediction Error (O.P.E) criterion 
which can be proved, in some cases, is equivalent to the M.E. but is more general 
than the later one. 

The model fitting problem is mathematically stated as in the following: 

Let i2(0), i2(l), … ， R(p) be p + 1 values of a covariance function of a stationary 
series and 

r p+ i > 0. (2.71) 

Put 

K\ = {^t • is stationary series, R^(k) = R[k)，Q < k < p}, (2.72) 


then we want to find a series rj t 6 K\, such that its O.P.E. is the maximum in Ki, 
i.e. 

c { o n) > C { 0 () , V$ £ € Kl (2.73) 

The preceding statement is under the criterion of O.P.E.. 

Now we put 

尺 2 = {$t is a stationary series, R^(k) = i?(A:),0 < < p, 

\og f > —oo }， (2.74) 

where 々（入 ） is the spectral density of the stationary series 
Then we want to select a series rj t 6 such that 

I(fn) = [ log f n (X)d\> [ log/ e (A)(iA = 7(/e), for € K 2 . (2.75) 

•/n Jn 

The second statement is the model fitting under the criterion of M.E.. 

By the definition of K\ and K 2 we know that 

<2 C (2.76) 

since not all of the series in K\ will ensure validity of the following inequality 

/ log > —00. 

Jn 

However, if we consider the O.P.E. in K 2 、 then two of the criteria may lead to 
the same results in K 2 - In fact, we will show that 

>4°. € ^ 2 , iff /(/,) > V6 e K 2 . (2.77) 
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The following theorem reveals the relationship between 1(f) and Co. Avoiding 
some required knowledge of the boundary values of analytic functions, we will give 
the proof only in the case of ARMA model, the readers can find the general results 
in the book of Rozanov (1967). 

Theorem 2.8 (Kolmogorov Formula on O.P.E.). Suppose that is an ARMA 
model, then the O.P.E. of can be represented as 

c 0 = ( 27 r )2 expj ^ J log/^(A)rfA|, (2.78) 

where / 之 （入 ） is the spectral density of 


Proof. Suppose that the ARMA equation of ^ is 

^(u)^ t = e(u)s u 

then we know that (see (2.13)) the spectral density / 《（ 入 ） can be represented 


hW 


2n 


|rf A )| 2 ， Aen , 


(2.79) 


where 


r 々 ）= 


Q ⑷ 

兩， 


is analytic in \z\ < p, and p > 1. 

Let 屮 ( 2 ) be the argument of r^( 2 )，then we can rewrite r^(z) as 

r (( 2 ) = |r e (z)|exp{t^(z)} ( 2 . 80 ) 

and log ( 2 ) = log |r 《 ( 2 ) I + i • 屯 (z)• 

According to the theory of functions of complex variables, it is well known that 
the real part of an analytic function is a harmonic function. Hence, we have 


iog|r“o)|=i | n iog|r { (e- a )|dA 

J n l ° g 

=log y/2-K + / log fc[x)d\, (2.81) 

47r */n 


where |r^(e -tA )| = 丌 / 《 ( 入 ) is obtained from (2.79). Rewrite (2.81) into the 

exponential form, one can obtain 

| F ( 0 ) l =£ iogv^ exp{ J_ f i og/ ( A)dA} 

Jn 

=\/27rexp {— [ log f^(X)d\} y (2.82) 

47r Jn 
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and c 0 = |r 《 (0)|. ■ 

Remark 1. The formula (2.78) was obtained by Kolmogorov in 1941， where the 
condition of the stationary series 匕 is a “regular series” ， i.e. the spectral density 
/ 之 ( 入 ) exists a.e. in [—7r, n] and keeps the inequality 

f log > —oo (2.83) 


Equivalently, if can be represented 


= 〉: c k^t—ki 


where et satisfies four conditions of Theorem 2.3, then the O.P.E. formula (2.78) 
holds. 

Remark 2. Based on the formula (2.78), we have known that to maximize Co is 
equivalent to maximize the 7(/), so (2.77) is evident. 

Remark 3. Since the set of regular stationary series is a subset of non-singular 
stationary series, so we know that the criterion of O.P.E. in is more general than 
that of the M.E. in XV 

Now, the following theorem answers the previous model fitting problem. 


Theorem 2.9 (Existence of the Model Fitting). Suppose that i?(0), /2(l),..., 
R[p) are p+1 values of the covariance function of a stationary series, and the matrix 


'刪 则 
丑⑴ 刪 


… i2(p) 

.•. J?(p — 1) 


Vi2(p) R(p-l) ... i?(0) J 

is positive definite. Then there always exists an AR(p) model with rjt € /Ci, and its 
parameters (9q, ,... , <j> p ) are determined by the Yule-Walker equation 




Proof. By the Yule-Walker equation (2.86)，we have 
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det . 

1 = 


fi(l) ... R(p) \ 

R p 

_ _ Op det R p 

det Rp +1 det Rp + i 


(2.87) 


so, since R p +i > 0 ， detR p+ i = (detR p ) 的 > 0 ， i.e. 6o > 0 and by (2.86), the 
solution of the Yule-Walker equation ⑽，沴 i，". , <t> p ) exists. 


Now, put 




r Rk = R{k), 

0 < A: < p, 



1 

1 

! -Rfc =- 

p 

4>nR(k - n), 

rj = 1 

A: > p, 

(2.88) 

then 


detR p+L+ i = (detR p+L )^ > 0, 

(2.89) 

where 


i 

( 

-^p+L ^ 



Rp+L+l = 

r p+l 

Ri 

• (2.90) 


V R p +l … R\ Rq ) 


Indeed, for L = 1, (2.89) is evident. 

If for L > 1 ， detR p+ L > 0, then we need to prove by mathematical induction 
det Rp+i+i. > ◦, which can be derived as in the following: 

By the block operation in matrix theory, we can get 

detR p+L+1 = (detR p+L )(i? 0 -7 T Rp+L^) J (2.91) 

where = (i2i, i? 2 , • • • , R p +l)- 

According to the definition of Rk in (2.88), we have 

Rq - =r o _ (丑1，…，丑 p + L ) 卜多1，…，- 々 p ，0,... ,0 ) T 

(2.92) 

p 

=R。+ 4>iRi = Oq > 0 y (2.93) 

z=i 

that proves det R p+ l+i > 0, for any L > 1. 

Thus, we have proved that the following series 

Ro, Ri } ... ,/2 p , ... (2.94) 
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are positive definite series. Now, put 

at =0, 力 = 0 ，士 1 ，...， (2.95) 

^s,t = -^|t—«|» s, f = 0, 士 1, • • • ， (2.96) 

since {/^} is positive definite, so too 

(R\t-a\)l<t,a<n- (2.97) 

According to the existence theorem of Gaussian processes (see Theorem 1.1 )， 
there must exist a Gaussian series rjt, such that 

R v (k)=R(k),0<k< Pi 

Rr,(k) = R ki p < k. 

(2.98) shows that is a stationary Gaussian series, rj t € Ku and 

p 

^(U)R n (n) = <t>kRrj、 n — A:) = 0, for n > 0. 

k=l 

The last equality of (2.99) can be obtained by (2.88) and Yule-Walker equation. 
Now, we want to prove that rjt is an AR(p) model. 

First, it is not difficult to prove a very important proposition, that is: the solution 
(彡 1 ，彡 2 ，... , 0 P ) of the Yule-Walker equation always possesses the “minimum phase 
shifting” property, i.e. 

p 

^(z) = |s| < 1. (2.100) 

k=0 

Indeed, suppose that such proposition were not true, then there would exist at 
least one root, say /x -1 , |/x| > 1, lying outside the unit circle, then we could rewrite 
the as 

$( 2 ) = (1 

where $ 1 ( 2 ) is a polynomial with order p — 1. Put 

Z(t) = ^> l (U)r Jtl (2.101) 

u t - ^(U)rjt = Z(t) - fiZ(t - 1), 

then 

E(u t rj t _ k ) = E(^>{U)rj t rf t _ k ) 

p 

i=o 

p 

= =0, VJt > 1 (see (2.99)). 

1=0 (2.102) 


(2.98) 

(2.99) 
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It shows that 

u t 丄 rjt 一 k ， VA: > 1, (2.103) 

thus 

Z(t - 1) = 丄 u t (2.104) 

holds. Furthermore, since r\t would be non-singular, we have 

cl = E\9(U)r, t \ 2 

P 

=E\rn + J2^~k\ 2 > (2.105) 

k= 1 

By (2.102), the following inequality 

EZ(t ) 2 = E\nt®nZ{t-l )} 2 

= ol + n 2 EZ(t - l) 2 

> ^EZ(t - l) 2 > EZ[t - l) 2 (2.106) 

shows a contradiction to the fact that Z(t) is a stationary series (see (2.101)). 
Therefore, the first assertion (2.100) is proved. 

Secondly, we can prove that rj t is an AR(p) model. In fact, since rjt is a non¬ 
singular series, its covariance function Rk satisfies the condition (2.99) and {0i, 多 2, 
… , <t>k} possesses the “minimum phase shifting” property (2.100)，according to the 
Remark 2 of Corollary 2 of Theorem 2.6, we know that rjt is AR(p) model. | 

From the proving of this theorem, the following corollary is evident. 


Corollary. Suppose that 丑 (0) ， i2(l) ， … , R(p) are p +1 values of a covariance func¬ 
tion of a non-singular stationary series, then the solution of Yule-Walker equation 
{ 彡 1 ，彡 2,… , <t>p) possesses the property of 


Y2 ^ kzk ^ 0, 卜 1 < L 

k=0 

咤 >0. 


(2.107) 


Theorem 2.9 shows that the set K\ is not empty and with the following theorem 
we solve the problem of how to select the “optimum” series in Ki under the criterion 
of O.P.E.. 

Theorem 2.10 (Optimality under O.P.E.). Suppose that 72(0),i?(l),... , R[p) 
are p + 1 values of the covariance function such that r p+i > 0 and rj t 6 K\, then 
the O.P.E. of Tjt is the maximum in Ki if and only if rj t is an AR(p) model. 
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Proof. Suppose that rj t E Ki and Cq 1 ^ is the maximum in i.e. if there is 
another z t E K\, then > c^ z ). 

According to the Theorem 2.9, there exists an AR(p) model, say z t G K\^ and so 
we have 


C Q Z) = e 0 Z) = ^k^t-k 


(2.108) 


wh^re 4>k = — <)>^\ k = 1,2,... ,p and 0^ are parameters of the AR(p) series z t . 

Now, for r/ tf take { 彡匕 )， k = 1,2,... ,p} as the coefficients for prediction, we 
have 


m 




Vt-k 


> Ikt - rjt-i,i\\ = 


where 


Vt-i t i = Proj (rj t ). 
Hn(t-i) 


(2.109) 

( 2 . 110 ) 


However, the left hand side of the inequality (2.109) can be rewritten as 


k=l 


=i^ ⑼ + L Rn{k)<l)k 



=i2 ⑼ + E “ 及 ㈦ 

k=\ 

=(^) 2 = (4 Z) ) 2 - 


where Rrj{^) = i2(A:), k = 1,2, ... ， p, since rjt G K\. 
By (2.111) and (2.109), we have 


e^ = c^>c^, 

that leads to 

Cq Z) = <4”)， 


since 々 ）is the maximum in K\. 
Accordingly, we have 


〜 -乞 ^k P)r lt-k = > 0 , 

k = l 


( 2 . 111 ) 


( 2 . 112 ) 

(2.113) 


(2.114) 



i.e. 
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H Proj(, t ) = X：^), t _, 

Now, consider the innovation of rj tl which is defined by 




Vt 一 


Ikt -勺 -1,1 ii 

_ ELi <t> ( k\t-k 
— 

as a normalized white noise (see Theorem 2.4b), hence we have 

p 

+ 〉: ^kVt—k ~ ^ti t = 0 , 土 1, 土 2,... ， 
k=l 

which shows that rj t is an AR(p) model, since 

4”) > o ， 

p 

Y2 ^ kzk ^ 0 - i 2 i < !• 

k=Q 

That ends the proof of the necessity of this theorem. 

Now, we prove the sufficiency of the theorem: 

Let rj t G K\ be an AR(p) model, its parameters are { 郎， 必 1 ，多 2, 
satisfy the Yule-Walker equation 


Fp+i (1， 彡 1 ， .. •，0 p)T = (衫， 0，... ，0)了， 


then we have 


4) = 


rjt - 4>[ p) rit-k 

k= 1 


[4>k = ~<i>k P) )- 


Suppose now that zt 6 K \, then the covariance function 
Rz{^) = R[k)y A; = 0,1,2,... ,p, 

and so 

p 2 p 

z t ~ Y] 4>^ zt-k = 私⑼ + ^2 <hRz(k 、 

k=l k=l 

= R(0) + J2<t>kR(k) 

k = l 

=^ = (4 n) ) 2 

> (c^) 2 . 


(2.115) 

(2.116) 

(2.117) 

(2.118) 

(2.119) 

Ap) which 


( 2 . 120 ) 

( 2 . 121 ) 

( 2 . 122 ) 
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The last inequality in (2.122) is based on the fact that z t € ICi, but the left hand side 
of (2.121) may not be the minimum error, which shows that cfhs the maximum in 
Ki. I 

Combine the Theorems 2.8 and 2.9, we know that for /2(0), i2(l), i?(2),. .. , R(p) 
given and r p +i > 0, the optimum fitting model under the criterion of O.P.E. must be 
an AR(p) model, and the parameters of the model {do^cpi ,... , <t> p } are determined 
by the Yule-Walker equation 

r p+ i(l, , 4 > p ) t = (^,0,... ,0) T . (2.123) 

Remark 4. we have proved that the optimum fitting model rjt is an AR(p) model 
belonging to IC\ , and it is also easy to know that rjt E K 2 - Indeed, since we have 


八 ⑷ - 2 ， (e-“)|2 ， 

(2.124) 

0 < m < |$(e 一 “）1 < M < + 00 ， 

(2.125) 

log A (A) = log ^ - 21og|$(c _,A )| 


> log ^0 _ 2 log M, 

(2.126) 

f log /^(A) d\ > 27r log — — 4n log M > — 00 . 

Jn 27r 

(2.127) 


Accordingly, we now know that the optimum model fitting under the criterion of 
O.P.E. in K\ is equivalent to that under the criterion of M.E. in XV 

In the sequel, we sometimes also call the AR(p) model fitting briefly as the M.E. 
model fitting. 

Remark 5. An interesting problem arises in model fitting is the uniqueness of 
fitted model, i.e. whether the AR(p) series rj t defined by Theorem 2.9 is the unique 
optimum solution under O.P.E.. Mathematically, we want to know if it is possible 
to find another model, say z\ G JCi ，which also keeps the maximality, i.e. 

4 Z )=4”). (2.128) 

In fact, we know from the above two theorems that rj t and z t both are AR(p) 
models with their parameters ... ， ip p } and … , <p p } satisfy the 

following Yule-Walker equations 


r ㈡ “n.. ,<t> P ) T = ,o) T 


(2.129) 



d (1,V>1,... ,V>p) T = ((^ z) ) 2 ,o,... ,o) T 
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(2.130) 


where 


■ p+i 




叫 l) 
雜） 


Vi?( P ) R( P -l) 


R{p) 


R{0)) 


■ p (”） 

b+l 


(2.131) 


Therefore, we have = <^^, 1 < A; < p, 0^ = 9q Z \ Hence, the covariance 
function, spectral density of z t and rj t are the same, but the innovation series e[ z ^ 
and erp) can be selected differently, they can have different statistical distributions. 
In this sense, we say that the probability fitting model under M.E. is not unique. 


§2.5 M.E. Model Fitting for Observed Data 

The main goal of this section is to introduce some theory and methods of M.E. 
model fitting which starts from the observation data and different from the last 
section which assumed that p + 1 values of covariance function J?(0), R(l), … ， R(p) 
are given. 

2.5.1 M.E. Model Fitting with Sample Covariance. 

Suppose that xi,x 2> ...、xn are consecutively observed data of a stationary non¬ 
singular Gaussian series. Suppose again that the covariance function R(k) of the 


series satisfies the condition 

of (see (1.136)) 




1 丑 ⑷ 1 < 吾， 

K t a > 0 


and put (see (1.137)) 

^ N-k 



In(^)= 

x l x k+l ， 

1=1 

0 < k < N - 1. 

(2.132) 

Then by the Corollary of Theorem 1.9, we know that 


lim a.s. 

,A: = 0,1,2,... , My 

(2.133) 

yV-* + oo 



where M is a non-negative integer, 




0<M<N 

- 1. 

(2.134) 
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Now, we want to show that the following matrix 


7N ⑼ ⑴ 

7^(1) 7at ⑼ 

. M4-1 = f 

V^y(M) 7^(M - 1) 

is a non-negative definite matrix. 

In fact, for any vector a = (aj, … , ajv) T , 

a = 士 (o ： i ， … ,ajv). 


Put 


/ ^1 ^2 … 工 N 


L 7 


then we can rewrite 


怎 2 


工 N 


1n{M) 

^ n [M - 1) 
7^(0) 


(2.135) 


(E^kXk 

— 1 

2^/ 1 工工 fc+i ••• 

X\X N > 

/ ai \ 

— 1 

Z^i 

/ 工 k 工 k • • • 

Yl\ x k^k + N-2 


V 工 i 

5Zl x k x k-\-N — 2 ••• 

2Zl x k^k J 

V CLfi ) 


X 1 x 2 ••• X N J Nx(2N-l) 


cx t T n a = ^a T L T (La) = 士 (I a) T (I/a) > 。 


(2.136) 

(2.137) 

(2.138) 


since a T L T is 1 x (2N — 1) matrix. 


Accordingly, we may take 
1 N ~ k 

Ik = ^ XlXk^l, /c = 0 ， 1,". ,p(p < N), 

1=1 

as the estimates of p + 1 values of covariance function jR(O), 丑（ 1)， . •. ， i2(p), which 
are strongly consistent estimates (see Theorem 1.9 and its Corollary). 

Suppose that 


/ 7o 7i 
7i 

r p+i = ： : 

7 P -i 


7 P \ 

7 P -i 

. > 0 , 
J 


0<P<^-1, 


(2.139) 
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then by the theory and methods of the preceding section, under the criterion of 
M.E., the optimum fitted model rjt is an AR(p) model, its model equation is 


(f>krit-k = 0 O £ tl (2.140) 

k=0 

and the parameters ( 设 0 , 多 1 ，炎 2 , • • • , <t>p) are determined by 

r p+1 (l,^ ll4 i 2 ,... ,<P P ) T = ((9^0,... ,0) T . (2.141) 

which shows that 

rit= [ e ixt _ ikX dZ c , « = 0, 士 1 ， ±2，... (2.142) 

J —7T Z-»0 ( * >k6 

is a reasonable fitted model for the observed data X!, x 2 ,... 、 when N is suffi¬ 
ciently large. 

The spectral estimate for {x t } is 


fnW = 



a 6 n, 


and the covariance estimate is 

Rr,(k) = /" e- ikx f n [\)d\ 
J —W 


(2.143) 


(2.144) 


2.5.2 Order Selection Problem. 


In the M.E. model fitting problem discussed before, the order p is assumed to 
be known in advance. If we start from N observation data, how to determine the 
parameter p is a very interesting and important problem. As shown before, the 
model fitting is based on the estimation { 7^)0 { 五 ( 左 )}2, where 


0<p<N-l, 

but if we select p = N — 1 then we have 

1 

lN-1 = 77 工 1 工; V. 


(2.145) 


(2.146) 


It is apparent that (2.146) instead of R(N — 1) will introduce a serious error in 
the fitted model under M.E. criterion. However, if we select a very low order p, 
sometimes we may not get a very good fitting model. How to select an appropriate 
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order p in the AR(p) model fitting was a challenging problem in the last decade and 
has attracted many many statisticians worked in this field. Due to the limited size 
of this book we can only introduce some of the fruitful results on this problem and 
all the proofs will be omitted here. 

On the order selection problem, there are many criteria and methods introduced 
by several researchers, one of the most •famous and widely used in practice is the 
Akaike Information Criterion (AIC) (see Akaike(l974)), though this criterion still 
has some defects from statistical point of view. 

The basic idea of AIC can be introduced briefly as in the following: 

Suppose that x t is an AR model, and the real order p is unknown. Now, we 
consider the model possessing parameters 0 。 = ( 吟 , ... ，设 J,0,... , 0), ^ 0( 尸）， 

1 < p < 5 < P, where 0( p ) is a vector set of 尸 -dimension. 

Evidently, the TV—dimension joint probability distribution po(^) of x = (x\, X 2 , 
… ,x^) of xt depends on 0o 、can be denoted as 

g(x| 〜） =p 0 (x), (2-147) 

then according to the Kullback information*, we may have 

/(po,17(x|^o))=0. (2.148) 

If 0q are unknown parameters, then we select a vector 9q 6 as the estimate of 
Oq 、such that the information 

/(Po,ff(x|0 o )) =Min, e 0 € 0 (p) , (2.149) 

and the minimum dimension of non-zero element in §q 、say 5, will be an appropriate 
estimate of the order, where 

1 < 5 < P. (2.150) 

Akaike (1974) proved that, under the idea mentioned above, (2.149) will lead to 
minimization of the following equality 

AIC(s) = log(H) +2^：, 0<s<P (2.151) 

where N is the sample size of observations, s varies from 0 to P and is the 
maximum likelihood estimate of the residual variance. 

♦Let Pi, P2 be two probability density functions, the Kullback information is defined as 

HPuPi) = I log dz 


and Pi = P2 iff /(Pi, P2) = 0 . 
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In application, under the AR model, we can obtain the o 2 s by the 0q 3 ^ of the 
Yule-Walker equation with order s (see (2.141)). If there are two values si < 62 , 
such that 

AIC(s!) = AIC(5 2 ), (2.152) 

then, according to the parsimonious principle, we will select 5 i to be the order 
estimate. 

Summarize the preceding discussion, we have the following procedure for model 
fitting under the M.E. criterion. 

1 . Put 

^ N-k 

^ A: = 0,1 ，… ,P (P < N). (2.153) 

/ =1 

A considerable value of P is 0(\og(N)). In practice, P can be selected as Ky/N, 
1/2<K< 2. 

2. Solve the Yule-Walker equation (2.141) by the Levinson recursive algorithm 
(see Theorem 2.7) for <s = 0,1 ， 2, . .. ， P. 

3. For each s, let ( 沒苦⑷，彡匕 )， • •.4 s )) be the solution of (2.141) and calculate 
the AIC(s): 

AIC( 5 ) = log ( 衫 ⑷) + 2 惫， 0 < 5 < P, (2.154) 

and select s = 句 ， which is the smallest one such that 

AIC( 5o ) = Min{AIC(s)}. (2.155) 

8 

4. The fitting model for the observed data {xjt, 1 < A: < N} under the criterion 
of M.E. is 

^t-f^,<l>i ao kt-k = e { 0 ao) e t . (2.156) 

fc=l 

5. The spectral estimation of the data can be obtained as (2.143). 

Once the model fitting (2.156) is done, much statistical analysis for the observed 
data, such as making forecasting, periodicities analysis can be carried on. 

Remark 1. In recent years, there are several very good algorithms for solving the 
Yule-Walker equation (2.141) such as Burg (1975), Marple (1980), etc. 

Another important problem arises in the order estimation is: Suppose that the 
data {xk, 1 < A: < N} is recorded from an AR(p) model, and is the order 
estimated by AIC, one wants to know if it is possible to have pyv P, when 
N sufficiently large. This is the so-called “consistency” problem. Shibata (1976) 
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showed that even in the probability sense, the order estimate pn by AIC is not 
consistent.* 

Theorem 2.11 (Shibata). Suppose that xt is a Gaussian AR(po) model, where 
0 < po < P, and Xj, X 2 ,...、xn are samples of xt. Let be the order estimated 
by AIC, then the following results 

lim P{uj : p N {uj) = s} = q a - Po Qp- a , Po S s $ 尸； 

N—* + oo 

P{oj : lim pjv (^) = 5 } = 0, 0 < 3 < po- (2.157) 

N—♦-+-00 

hold, where 

墙点⑼ }’ (2158) 

^=r{n^(^) r '}. ( 2 圳 

with a t = P{xl > 2i}, xf being the Chi-square random variable with i degree 
of freedom, and represents the summation running over all of the integers 
{ri ， r 2 ,... ,r n }, such that 

n 

^/r n = n, (2.160) 

l=i 

(see Spitzer (1956)). 

Shibata Theorem shows that, on the one hand, AIC is not a consistent estimate 
and usually is over-estimated; on the other hand, he gives some interesting numerical 
calculation results, in which Shibata showed that the probability of 

Pn = 尸 {w : p^(cj) = p 0 } 

is still not a small value when N is sufficiently large. 

The probabilities of 尸 0 = \\m^^ +OQ P{p^ = p。} for P = 10 are listed in the 
Table 2.1. 


Table 2.1 


Po 

0 

1 2 

3 

4 

5 

6 

7 

8 

9 

尸 0 

.7171 

.7188 .7210 

•7241 

.7285 

.7349 

.7446 

•7602 

.7874 

.8427 


*H. Akaike showed the non-consistency of AIC by himself and suggested another order determina¬ 
tion criterion as BIC. 
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Since the probability Pq is not less than 70% in Table 2.1, many AR model fitting 
with the orders determined by AIC still to be successful in practice. 

Akaike suggested another order selecting function as 

BIC(3) = log ( 杞 ( 5 )) + 5 log(iV). (2.161) 

Compare (2.161) with the (2.154), the reader can find the only difference between 
them is that of a “log(TV)” in BIC instead of the constant “2” in AIC. 

An, Chen and Hannan (1982) proved that the order estimate by BIC, is a con¬ 
sistent estimate for the order of AR(p). 

Theorem 2.12 (An-Chen-Hannan). Suppose that x t is a Gaussian AR(po) 
model, Xi , X 2 ,... , are observed samples. Put 

P{N) = 0(log(7V)), (2.162) 

and denote as the order estimate of the model which minimizes the BIC(<s) in 
the interval [0, P(N)}, then 

Pn Po> a.5. (N —♦ +oo) (2.163) 


holds. 

Hannan and Quinn (1979) suggested another criterion for order selection as 

HIC(s) = log ( 碎 ⑷) + c 士 log log (TV), c > 2, (2.164) 

under some mathematical conditions, they proved that HIC can also offer consistent 
estimate for the order. 

Remark 2. The preceding model fitting theory and method discussed in detail 
are under the criterion of O.P.E. (or M.E.). Under the mathematical point of view, 
such model fitting is equivalent to fit of an AR model. Therefore, it is apparent 
that if the investigated object deviates not far from the AR model, one can expect 
that the model fitting introduced above can offer very satisfactory result, otherwise 
some trouble problem may occur. 

Hence, some researchers are seeking some more significant fitting model, e.g. 
ARMA model, etc. The following procedure for ARMA modelling sometimes is 
convenient for practical application (see Dzhaparidze and Yaglom (1983)), where 
the order selection is not involved: 
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Step 1. Start from the observation {z t }^, calculate the sample covariance 
1o,7i ， … ,7p+g as in (2.153). 

Step 2. Since the covariance function of ARMA model satisfies the following 
equation (see (2.42)) 

p 

^(fi k R(n - k) =0, n > q t 
k=0 

so, we can obtain the parameters for U 2 , … , 4> P by the following “Extended 
Yule-Walker” equation 


^2,ln-k4>k = 0, n = g + l，g + 2,."，g + p, 


k=0 

where we assume the order (p, are known in advance. 
Step 3. Put 

p 

y< = 4>k^t-k, 

k=Q 


if x t is ARMA model, then 


yt = ^ Ok^t-k 
k=0 


is an MA(g) series, its covariance function will satisfy the equation 
Ry{n) = ^2 0kOk-n, 0 < n < q. 

k=n 

Now, after step 2, we can define 

p p ♦ 

ly(n) = ^k4>nn-k+h $0 = 1 ， n = 0,1,... ,g, 

k=0 1=0 


(2.165) 


(2.166) 


(2.167) 


(2.168) 


as the estimates of ( J R y (n)} and solve the following equation for estimating param¬ 
eters {H … 、 6 q }: 


( n ) = £ 谷 kh—n, n = 0,1,2 ,... ,q. 


(2.169) 


Step 4. The fitted ARMA model is defined as the stationary solution of the 
following equation 


4>kit-k = ^ Ol^t- 


fc=o 


(2.170) 
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Remark 3. From the AR model fitting we know that O.P.E. is a good criterion for 
modelling. A very interesting problem arises after the Burg's M.E. model fitting, 
that is: under the same criterion, what kind of constrained conditions should be 
imposed so that the optimum fitted model can be an ARMA(p, q) model? This 
problem perplexed the statisticians for many years after the M.E. modelling ap¬ 
peared in 1967. 

Frank (1985) reported the answer of the problem as in the following: 

Suppose that it is a regular stationary series (see (2.83)) and the Wold decom¬ 
position of xt is as 

Xt = €t -h\et— i + • • • + h 3 £t— 3 + • " (2.171) 

where £t is an unnormalized innovation series. Now, suppose that p + 1 values of 
C.F. and q coefficients 

Rk, 0 < /c < p; h u \<l<q, (2.172) 

are given. Define the stationary series set 2 as 

^ = {^t * Rd k ) = 私 ， 0 ^ ^ ^ P ； h d s ) = ^3,1 < 5 < q}, (2.173) 

then under the criterion of O.P.E. the optimum model rjt in R. which maximizes c§ 
if and only if rj t is an ARMA(p, q) model. 

A trouble problem in applying such result in practice is: if we start from observa¬ 
tion data {x \, X 21 ••- , how can we estimate the coefficients {/ii, / 12 ,... ， h p }? 

Luan and Xie (1991) suggested an efficient algorithm for estimating the coeffi- 
cients {"i, " 2 ,... ，九 p} and proved that such estimates are consistent. 
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CHAPTER 3 

Prediction, Filtering and Spectral Analysis 
of Time Series 


In this chapter, the theory and methods on prediction, filtering and spectral 
analysis of time series will be introduced, and mainly the series is to be stationary, 
such as AR or ARMA models. 


§3.1 Prediction of Time Series 

In this section we shall first discuss the prediction problem of stationary time 
series. Besides the theory, some very useful predicting procedures will also be 
introduced. 

The mathematical statement of the general prediction problem can be formulated 
as follows: 

Suppose that x(t) is a time series,{x(s) } 5 < i} are observed samples, e.g. {..., 
a:(—2) ， x(-l) ， o:(0)}，then for M > 0, we want to find a prediction function of 
{x(s) 1 s < t} 

= /( 工⑷， 外 -!)>•••) (3. 1 ) 

such that the prediction error of * n 丑 z ， i. e . 

|| x t+M - ^t,M|| 2 (3.2) 

is minimized in H x [t). 

Now, suppose that the series x(i) is a regular stationary series, the Wold decom¬ 
position (see (2.83), (2.84)) is 

oo 

z (0 = [ - k ) 

k~0 


(3.3) 
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where e ⑴ is the innovation series satisfying the conditions of 1-4 in Theorem 2.3, 
then the prediction problem of (3.1), (3.2) can be theoretically solved in the following 
way: for any M > 0, we have 


x(t + M) = ^ Cks(t - M — k) 
k=0 

M— 1 oo 

=^2 十 M - A;) + ^ Cke(t - M — k), 

k=0 k=M 

which can be resolved into two orthogonal variables 

工 (t + M)= 厶 t,M ㊉ x tf M (3.4) 

with 

M-l 

Ck£ ^ + M - 夂)， 

k=0 

oo 

Cke ^ + M - 岣 . (3.5) 

k-M 


By the well-known Riesz theorem in. Hilbert space, we know that the optimum 
solution of (3.2) is 

U,M = Proj{x(i + M)}. (3.6) 

Hx(t) 

Now, by the orthogonal decomposition of (3.4), it is evident that 

+ A/ — k) G H x [t)j k = Af, Af + 1， •. • (3.7) 

and 

£■(《+ Af" — fc) 丄 7ir z (i )， ^ = 0,1,2,... ,Nf — 1. (3.8) 

Thus, 丄丑 z(0, an d it,M 〔 丑 z(0, i- e . the optimum solution for the prediction 
(3.2) is 

Ct，M = 壬 t,M_ (3.9) 

Hence, the prediction problem in Hilbert space H Xi can be solved by the Wold 
decomposition easily. But the reader must bear in mind, in practical problems, what 
one wants to obtain is the estimate for the prediction Xt t M given in terms of the 
samples (x(t), x(< —1),...) rather than the functions of innovation —1),...). 

Since the innovation series {e:(i)} is unobservable, and estimating the {c*} is 
not so easy by samples {x(A:)}, we need to develop some predicting procedures for 
practical purpose. 
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In the following, we will show how to obtain the optimum solution which 

is given in terms of samples under the models of AR, MA and ARM A. 


3.1.1 The Prediction Formula for AR Models. 

The following theorem shows a basic prediction formula for AR models: 

Theorem 3.1 (Prediction for AR). Suppose that x(t) is an AR(p) model 
(p > 1), and its equation is 


p 

(i>k 工 (t - k) = 0 O ^(t)y ( 3 . 10 ) 

k=0 

then for any Af > 0, the optimum prediction i t ，M can be represented as 




fc=0 


-A:), 


where is defined by 


P[ M) 




0 

0 


(pi 



/ cm \ 

C M+1 

y 

V CM-f-p-l ) 


( 3 . 11 ) 


( 3 . 12 ) 


and {c/t} the Wold coefficients (see (2.36))，obtainable from the parameters of the 
model. 


Proof. By (3.9) and (3.5) we know that the optimum prediction is 

oo 

x t ,M = Cfce：y + M-A:). (3.13) 

k=M 

Since x(i) is an AR model, the stochastic integral representation for is (see 
(2.18a,b)) 

e(0 = f e , ' a r- 1 (e _u )rfZ I (A) ) (3.14) 

Jn 


r ，) = 0( 和 r 0 




k=0 


where 


( 3 . 15 ) 
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Substitute (3.14) into the formula (3.13)，then we have 

it,M = f V a<: ')r- 1 ( e - iA )rfZ I (A) 


、k = M 


士 /n e,(t+ 邓 (£ Cfc^e -<(fc+0A ) dZ x (X). (3.16) 


Putting 

OO p 

V = y^ c fc^e~ t(fe+/)A , 

k=M / =0 

and changing the order of the summation, we have 


V 


/M+p—\ 3 — M oo p \ 

: E E + E 

=M+p / =0^ 


c- iaA c 3 _^ z . 


\ a=M 1=0 

Using the recursive formula for Wold coefficients (see (2.36)) 




0, Vs > p 


the second term of the summation V in (3.18) is zero, and 






1 a — M 


J 2 E e- iaX <Pic s AdZ x (\). 


=M 


By putting j = s — M, then (3.19) can be rewritten as 

itM =Y q f n « ，(t+M)A (E (E ) e~^+^ dZ x (X) 

= E(/ ? i M V«o) [ e'^-^dZ x (X) 

>=o Jn 

j=o 

where 

=CM, 

J 

E 4>lCj^-M-U 0 < J < p - 1. 


(3.17) 


(3.18) 


(3.19) 


(3.20) 

(3.21) 


(3.22) 
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That ends the proof. | 

The results in Theorem 3.1 is extremely important both in theory and in appli¬ 
cations, since it shows that for an AR(p) model, and for any M > 0 step ahead 
prediction, only p observations of the past are needed, and is of no use for more 
data. Particularly, if p = 1, we have the conclusion: 

“The prediction of future depends only on current observation and free from the 
historical data. 

That is the so called Markovian property. 

Based on Theorem 3.1, we can easily understand, why researchers prefer the using 
of AR model to MA or ARMA. 

Example 1. The average rainfall in city A is Ey = 540 (mm), and the bias of the 
rainfall y(t) vs. Ey 、denoted as x(^), satisfies the following AR(2) model 

x(0 + O.S4x(t - 1) - 0.3x(< 一 2) = l.ler ⑷. (3.23) 

The record in recent years of y ⑴ is in the following: 


t 

0 

-1 

-2 

-3 

-4 


y(0 

576 

496 

585 

470 

560 

(3.24) 

x(t) = y(t) - Ey 

36 

-44 

45 

-70 

20 



For M = 3, we want to make a prediction of io ,3 based on the record (3.24). 
Evidently, the parameters of AR(2) of (3.23) and its Wold coefficients are as 
follows: Oq = 1.1, <t>\ = 0.54, 4>2 — —0.3, p = 2, Af = 3. By (3.13) we have 

^o = 1.1, 

Ci = / >iCo = —0.594, 
c 2 = — <t>\Ci — <j> 2 co = 0.651, 
c 3 = —0ic 2 — <hci = —0.530, 
c 4 = 0.482. 

Hence, by (3.12) {/?{ 3 )} are as 


f Po 3) \ _ ( 1 0\ /-0.530\ _ /-0.530N 

) = \0.54 1 ) 0.482 ) = \ 0.195 J • 


So, 


壬 0,3 


1.1 


(-0.530XO + 0.195x_i) = -25.14 



i.e. 
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yo,3 = 壬 0,3 + Ey = 514.9. 

Theorem 3.2 (Recursive Algorithm for Prediction of AR). Let the AR(p) 
model equation of i(i) be 

p 

x{t) + ^2 “ 工 、 1 一矢 ） = 
k = l 

then, for M > 0, the A/-step ahead prediction of x(t) is 

p 

^t,M ~ - 〉 : — k (3.25) 

k= 1 

where 

Xt t M~k = + M — A:), if M — k < 0. 


The proof of Theorem 3.2 is evident and will be omitted here. 

By (3.25), the following recursive algorithm is convenient for applications: 


p 

壬 t,i = - - A ： + 1 )， 

k= 1 

(3.26) 

P 

至 t,2 = — — A ； + 2) — 壬 t,l ， 

(3.27) 

fc=2 

P 


X t ,3 = — ^ (t>k x {^ — A ： + 3) — 02 壬 t,l 一 多 1 壬 t,2 ， 

(3.28) 

k=3 


etc. 


In short: 


p 

x t , a = — ^ 小 k 工 [t - k s) — 多 a 一 lit ,1 一 . ••一 4>iit,a-u 

k=s 

1 < 5 < p 

(3.29) 

P 

Xt t 8 = -^2 p <s. 

(3.30) 


k=i 
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3.1.2 The Prediction Formula for ARMA Models. 

For ARMA models, we have the following prediction formula which is based on 
the recursive algorithm. 

Theorem 3.3 (Recursive Vector Prediction for ARMA Model). Suppose 
that the ARMA model equation of x[t) is 


^ <t>kx(t - k) ~ 0- 


where 

xt,l = Proj{x(t + /)}, I <1 < q, 

^x(t) 

then the recursive vector prediction algorithm for ARMA model is 
W? +1 = GW? + ax(t + 1) +/? 


h =| 


--0 0 
CO 

,———iq -^q-l ^o-2 

、 Co 

0 < J < P, 

0 ， 3 > P, 

Cl_ £2 \ 

、 c 0 ’ c 0 ’ Co J 


/? = f o,... ,o,- 乞彡，外 + 9 - i + i )) ， 


and put /? = 0, when p < q. 


Proof. Suppose that M > 0 is a positive integer, on account of the 
equation, we have 


ARMA 


M — k) = -f M — /). 


(3.38) 



Taking the projection on H x (t) for both sides of (3.38), we can get: 
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£ 


k=0 


4>kit ， M-k = Proj {e(i + M — /)}; 

i=o 好 *(0 


=~^2 ♦AtM-k 、 when M > 


q. 


By (3.5), we have 


x t +i,M = c k e(t + 1 + M - A:) 
k=M 

=Cm 已 t+1 + 壬 t,M+l 

cm 


Co 


(x(^ + 1) - Xt,i) + it.M+ 1 , when M < q\ 


where the q+i is replaced by the definition of innovation (see (2.30)). 
Hence we have 

壬 t+l，M = 壬 t,M+l + - (Z (亡 + 1)— 壬 t,l), f Qr 1 < M < g, 

Co 

and for M = q, by (3.40) 


^t t q+i = ~ > : ^l^t.q+l- 


l.e. 


I — Hf=l 和壬 t ， q+l -“ 

1 ~ Hi 扣壬 t,«7+l-/ + Ef =q+ 1 ^/^t+<7+l-Z, 


P<Q. 

p> q, 


where we recall that 


it,q+i-z = 工卜十 g 十 1 一 0 ， 9 十 1 S S P. 

Summarize the results of (3.43) and (3.44), we have the prediction 
(3.33) for vector Wf +1 . 


(3.39) 

(3.40) 

(3.41) 

(3.42) 

(3.43) 

(3.44) 
formula as 
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Example 2. Let the ARM A model equation be 

— 0.8x ( 之 一 l) == 一 c^t — 1) + 0.24c(i 一 2). (3.45) 

Now, the parameters of the model are p = I 、 q = 2 、 = —0.8, 0 o = 1, = 一 1 ， 

0 2 = 0.24. 

The Wold coefficients are 


Co = 1 ， 

c\ = —0.2, 

C2 

= 0.08 


=—0.8, ^> a : 

= o, 

3 > 1 


Then, we have 

where P = 0 since q > p. 

For Af > g = 2, by (3:40)，we know that 

Xt.M - = ••• = (0.8) M_2 I t ,J- 


(3.46) 

(347) 


(3.48) 


(3.49) 


Example 3. Suppose that the model equation of rr ⑴ is 

x(0 + 0.54x[t 一 1) _ 0.3x(t 一 2) = e:(<) - 0.5er(t — 1). (3.50) 

Now, p = 2, g = 1, </>i = 0.54，02 = —0.3, Oq = 1, = —0.5, and the Wold 

coefficients are c。= 1, ci = —1.04, C 2 = 0.86. By (3.33), we have 

- </»! ) x tt i -h —x(t + 1) + {-<t> 2 x{t)). (3.51) 

\ Co / c 0 

When Af > 2, by (3.40), we know that 

壬 t+i，M = —0.54x t+1| ^f_! + 0.3it+i，M - 2- (3.52) 

In practical applications, we need the initial predicting vector 

W? 。 = ( 壬 t 0 ， i ， 壬 t 0 ,2, … i ^t 0 ,g) T (3.53) 

for the rolling of the recursive algorithm (3.33). Since such initial vector is unknown 
in observation, we may take the first q observed values instead of (3.53). In the 
general case, when to is far enough from t, i.e. t — to is large enough, then the effect 
of the different selection of initial values for the prediction can be neglected. 



§3.2 The Linear Filtering of Time Series 
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The prediction problem is to make a forecasting of the future based on the ob¬ 
servations of the past and the main problem for filtering is to extract the signal 
from the noise background on the basis of observed data. At the first glance, these 
two problems look quite different but actually they have very similar mathematical 
background. 


Definition 3.1 (Stationary Correlated). Suppose that : c ⑴， y(t) are two sta¬ 
tionary series (Ex[t) = Ey(t) = 0)，we call them stationary correlated, if for t, 
t + 5 € T, 

Ex(t -h s)y(t) = B zy (s) (3.54) 

holds true, and 5 xy (5) is called the correlation function of i(t), y(^). 


Since we assume their means zero, so their covariance function is equal to their 
correlation function. 

Now suppose that x(t), y(t) are two stationary correlated series, then the filtering 
problem can be stated as: let M be an integer, and based on the observation 
(x(5),5 < t}, we want to find an element 

lt t M ^ ^x(t) = C{x(s),S < t} 


such that 

l|y(« + M) — | t , Mil 2 = ||y(« + M) - f|| 2 (3.55) 

will keep true. 

Since y(t + M) € H y = £{ 1 /( 5 ), 5 = 0, 土 1 ，士 2,. •. } C i/，and H x (t) is also a 
subspace of H, so the solution of (3.55) uniquely exists, that is 


6,m = Proj {y(t -h M)}. (3.56) 

心⑴ 


According to the theory of stochastic integral (see Chap.l), we know that ^t,M 


can be represented as 




e itX H M (\)dZ z (\), 


(3.57) 


where Hm(X) 6 L 2 (dF z ) is called the Characteristic Function of the Filtering 
(CFF). 

In the filtering problem, readers will soon discover that they need not restrict 
themselves to the case that M be a positive number as assumed in the prediction 
problem. The filtering problem will still keep meaningful even in the cases of Af = 0 
and M < 0. We shall call the three cases M < 0, Af = 0 and Af > 0, as “prediction- 
filtering ”， “filtering” and “delayed filtering” respectively in the sequel. 
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Now, we shall introduce the filtering theory on the basis of co-spectrum function 
or correlation function. 

Definition 3.2 (Co-spectrum Function). Suppose that x(t), y(t) are two sta¬ 
tionary correlated series, their correlation function R xy (s) satisfies 

oo 

^ I 足 cy ⑷ |<+oo (3.58) 

a = _oo 

then we call 

1 00 

f^y(^) = 〉: Rxy(^)^ xk ^t A € IT, (3.59a) 

fc= —oo 

the co-spectrum function of x(i) and y[t). Evidently, we have 

R xy {k) = / e <kx f zy (X) dX, A: = 0 ， ±l ， ±2 ,•••• (3.59b) 

Jn 

It is apparent that since we have 

■R yz (fc) -E[y[t + k)x(t)) 

=E(x((t + A:) — k)y(t -h k)) = R xy ( — k) (3.60) 

then 

fy^) =^;T, R y^ k ) e ~ ikX 

k 

= 士 (-矢 ) e_tfcA 

=/zy(A), A 6 n. (3.61) 


Now, consider first the ^whole-line" filtering problem as follows: let H z = £{x[s), 
3 = 0, 土 1, 土 2, ... } and M an integer, we want to know how to represent the CFF 
of it } M 6 H x in terms of co-spectrum / zy (A) and spectrum / ZI (A)? 

Theorem 3.4 (Whole-line Filtering). Suppose that z ⑴， y(t) are two stationary 
correlated series, R xy {k) satisfies the condition of (3.58), and / ZZ (A) the spectral 
function of 工 ⑴. For any integer Af, the optimum CFF of 之 t,M is 




一 iM\ fyx(^) 

' TJXY 


a e n, 


(3.62) 



and ^ tt M can be represented as 
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6,m = / e vtx $ M (A)(iZ I (A) € H x 
Jn 


subject to the condition 


||y(t + Af)-| t>A# || 2 = inf ||y(t + M) - f|| 2 . 

f 


Proof. Evidently, the variable 

it,M = Pro]{y(t + M)} 

Hr 

uniquely exists, and since it belongs to i/ z , it could be represented as 
integral 

/ <Pt,M dZ x , 

Jn 

and (3.66) is the projection (3.65) if and only if 

+ M) - &,m ，iW) H = °， for Vn. 
Accordingly, from (3.67) we have 

0 = (y(t + M) 1 x(n)) H - 

= R zy (t + M-n) - ^ 4>t M dZ z ,J^e iXn dZ^j 

— Ryx(t + Af — n) — f e~ lXn <l>t ) M fxz{^) ^1 

Jn 


therefore, (3.68) leads to 

♦ t，M 

where 


e ， (t+M)A fe(i) =ei<Vw(A)> 


4 > m (^) = c' 


tMA /y» ( 入） 

t^y 


Ry X {s) = / c taA /yx(A) d\, 5 = 0，土1，土2，. 

*/n 


That ends the proof. 


(3.63) 

(3.64) 

(3.65) 
a stochastic 

(3.66) 

(3.67) 

(3.68) 

(3.69) 
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Corollary. Let x(t) = y ⑷十 n(f), t = 0, 士 1, 土 2，.", where y ⑷ is the stationary 
signal series independent of the stationary noise n(t)i Then the optimum CFF 
(3.62) can be represented as 


d> M (X)=e iMX 


fyyW 

/yy(A) + /xx(A) 


a g n. 


(3.70) 


Proof. By x(t) = y(t) + n(t), we have 

^n(^) = [y(t -f A：) + n(t + k),y(t) -f n(t)) 

=■RyyW + 丑 nfl (/C), (3.71) 

since 

■Ey(s)n(Q = jEy(5)J5n(i) = 0. (3.72) 

Accordingly, we have 

/zx(^) = /yy ( 入）十 /rm ( 入 ) . (3.73) 

Similarly, we can obtain 

Ryx(^) = Ryy (^) > 

or equivalently, we have 

fj/z ( 入） = /yy ( 入)， A G II. (3.74) 

Now, substitute the results (3.73), (3.74) into (3.62), the conclusion of the Corol¬ 
lary is clear. | 

The whole-line filtration problem is quite easy to solve, and is sometimes very 
useful in applications. For example, if the observation 


40 = V(0 + 打 (0 ， f = 0, 士 1， 土 2, …， (3.75) 

where y(i),n(i) satisfy the conditions as in the Corollary, and also suppose that the 
spectral functions of y(t) and n(t) satisfy the “non-overlapping” condition, i.e. 


/w(A)/ nn (A) 

= 0 ， A G n, 

(3.76) 

and 



/yy(A) — 0 ， 

A 6 AC n, 

(3.77) 

then, the optimum CFF is 



= | 

e iMX , X E A] 

0 ， A. 

(3.78) 
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For pure-filtering, M = 0, we have 


^o(A) =| 


1 ， Xe A\ 
0 ， A. 


0o( 入） =Xa(A), A G n. 


(3.79) 

(3.80) 


Accordingly, the optimum filtering can be represented as 


lt,o 


e ltA <^o(A) dZ x [X) 


e lt ^oW dZ y {\) + I e in 4>oWdZ n {\), 


(3.81) 


where dZ yi dZ n are stochastic measure of series y(t) and n(f). 
By (3.80), we can obtain the optimum filtering of x(t) as 


6,o = / e itX XAWdZ v (\) + I e'^ XA (X) dZ n (\) 
Jn Jn 

= [e itx dZ y (X) = y(t), 


(3.82) 


since the support of the measure dZ y is on set A and dZ n is on A. 

However, (3.82) shows that no matter how strong the noise is, if the condition of 
(3.76) can be fulfilled, then the recovery of the signal y(t) by the filtering (3.82) is 
possible. A filter with CFF as (3.80) is called the “ideal filter” in engineering. 

In the general case, we can only have 


^o(A)= —— < 工， A e n - (3.83) 

1+ i^w 

Now, suppose that / yy (A) and / nn (A) are overlapped on (a, 6) as shown in Fig. 
3.1，then the CFF </>o(A) is as in Fig. 3.2. 

So far, we have discussed the filtering problem on whole-line and have solved it 
easily by projection method, but, solving of the half-line filtering problem is a much 
harder work than the previous one. However, Theorem 3.4 is a basic tool for solving 


such problems. Indeed, suppose that x[t) is an ARMA model we know that there 
are two isomorphic Hilbert spaces: H: and L 2 (dF x ) (see Theorem 1.4). 


In H x the innovation series 


e(t) = f e itx T- 1 (e- a )dZ x (X), < = 0,±1 ， ±2, … （ 3.84) 

Jn 



1 Spectral functions of signal and noise 



Fig. 3.2 OFF of 如 (A) 


3 orthonormal basis, where 


CIS ， l-l < i- 


0 ㈦’ 


Correspondingly, in L 2 [dF x ), 

^(A) =r- l (e- a )e itA , t = 0,±l,±2, 



are also the orthonormal basis, i.e. they possess the following properties: 
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1. 如 ( 入） 丄 V^(A )， s 羊 t ， (3.86) 

2. xptlX) e L 2 t [dF x ) © L^ Tl (dF x ), (3.87) 

where L 2 a [dF x ) = C{e lkx \ k < s}. 

3. ||0t(A)|| = l, (3.88) 

4. (V»t(A), t = 0, 土 1 ，士 2，."} in L 2 {dF x ) are orthonormal basis. (3.89) 


In fact, we only need to consider the inner product of {4«( 入 )} 

(lMA),0,(A))p = / - 小 |I7 1 (e- A )| 2 机 ( 入 ) 

•/n 

= S tl8 (3.90) 

that proves the assertions of 1 and 3. Furthermore, as r~ 1 ( 2 ) is analytic in \z\ < 
p ， p > 1, v/e have the Taylor series as 


e » •(卜》)入 


/x(A) 


| rx ( e -^)| 2 


d\ 


T x 1 i z ) = dkzk ' l 2 l ^ (3.91) 

k—O 

and hence 

_ = r;V At ) e - a 

= fy ke ， k )、 

k=0 

e L{e i,x -s <t} = L\{dF z ). (3.92) 


Furthermore, for «s < i，we have 


(叭 ㈧ ， e :， 


r. J 


2tt 


e .a r -^ c -,Aj e -i«A rfFi 
e < ( t - ,,)A (r i (e- a ))- 1 (27r)- 1 |r i (e-*' A )| 2 rfA 
f e^-^if^Cke^dX 


k=0 


=c k f e^~ 3+k)x dX. 

2 七 L 


(3.93) 
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Since t — s > 0, 

/c > 0, i — s -f A: > 0, then we have 



(0t(A),e taA -) L 2 =0 ， Vs < t, 

(3.94) 

that proves 

也 (a ) 丄 

(3.95) 

or equivalently 

*(A) e L}{dF z )eL\^{dF x ), 

- (3.96) 

i.e. (3.87) holds. 




As to the conclusion 4, it is easy to know from the isomorphic relationship 

e ⑴也 ( 入） (3.97) 

that t = 0, 土 1 ，士 2, •••} is orthonormal in L 2 [dF x ). 

Now, the solution of the problem of half-line filtering can be solved in the following 
way. 

To seek the optimum 

L,m = Proj{y(i + M)} 

"X ⑴ 

is equivalent to find it by the following steps: 

1. Find the element 

z = Proj{y(i -h M)}. 

H x 

2. The optimum filtering (3.98) is 

L,m = Proj{2>, 

since, we have 

\\y(t + M) - | t ,Af|| 2 = Mt +M)-z\\ 2 + \\z- 6,m|| 2 , (3.101) 

(see Fig. 3.3). 

Using the isomorphic relationship between H: and L 2 (dF x ) as well as H x (t) and 
L^(dF z ), we can rewrite (3.99), (3.100) into the form as in functional spaces L 2 [dF x ) 
and L^(dF z ). 

The first step, i.e. seeking for Z, it has been solved in Theorem 3.4’ the CFF is 
Km = e iXM 4> ti0 e L^(dF z ). 

The second step is to expand into the Fourier series with respect to the 

basis {V>fc(A)} 

^ ajk0fc(A), (3.102) 

k 


(3.98) 

(3.99) 

(3.100) 
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Fig. 3.3 The resolution of yt+M 

where 

(3.103) 

then the optimum CFF of the filtering ^ t ,M is 

t 

a k 7 PkWi (3.104) 

k=0 

therefore, the optimum filtering is 

itM = / 0Jk(A) dZ x (X). (3.105) 

k=o 

The reader can prove the following Theorem in the way as introduced above. 

Theorem 3.5 (Half-line Filtering). Suppose that x(t) satisfies the following 
ARMA model equation 

$((7)x(0 = 0(t/)g ： (O, 

and a: ⑷， y ⑴ are two stationary correlated series, their correlation function satisfies 

k 
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For Af > 0, the optimum filtering for y(t + Af) based on the observations {x(5); s < 
i} is 



Hm(X) dZ x (\) t 


(3.106) 


where 


Pk+M = J e *(* +M ) A / yl (A)(^) d\, fc = 0,1,2 ,…， 

©a = e(e-' A ). 


(3.107) 


(3.108) 


Example 4 (Half-line filtering). Let z ⑴， y ⑴ be two stationary correlated se¬ 
ries, their co-spectral function is 

/yx(A) = «- ,A (1 e : A at _: A) ， (3-109) 

and the spectral function of x(f) is 

= Aen, (3.110) 

where 0 < a,/? < 1, A = —?=, then 



For M = 0, in comparison with (3.108), we have 


Pq =B(l — a 2 ), B = 2ttc, 
Pi = - Ba, 

Pk = 0 , k > 1, 


(3.112) 



and 
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or 


H 0 {\) =B{{l-a 2 ) - (a 3 + j9(l- a 2 ))e~ ix 

OO 

+ - a) 

m=2 



e itX H 0 (X) dZ x {\) 


=B{{\ — a 2 )x(i) - (a 3 + /?(1 - a 2 ))x(t - l) 


+ a(0 — a) a M x(t — /i). 
m=2 


For M > 1, since Pk+M = 0, k >0, then Hm(^) = 0, 
For M = 1, P\ — —J3a, fik+i = 0, k > 1, we have 


^i(A) = 


-Bor 


1 + (a - p 、 e- iX + (1 - g ) 二 a/le ~"* 


/xA I 


^=2 


(3.113) 


(3.114) 


(3.115) 


i.e. the optimum filtering for half-line is 

it,M — —5a|x(i) + (o ： — P)x(t — 1) + (1 一丢 ) ^x(t — m) I . (3.116) 

In the case of x(t) = y(t) + n(t), y[t) is independent of n ⑴， and also when 
they possess the rational spectral function, A.M.Yaglom (1962) solved the filtering 
problem by complex analysis. 


§3.3 Spectral Analysis of Time Series 

The main goal of this section is to introduce the theory and methods of spectral 
analysis of time series. 

Many practical problems in scientific or engineering areas are related to the fol¬ 
lowing subjects: 

1. Let x(l), x(2), x[N) be N sample observations, we want to know, are there 
any harmonic (or seasonal) components in the data? How many? What the fre¬ 
quencies are ? Which is the strongest? 

2. Suppose that t/(l),y(2), y(N) are N observations of a stationary series, say 
a signal series, we want to know what is the distribution of the energy (or power) 
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of the series in frequency domain? Is the distribution of energy comparatively flat 
or sharp? Where is the extreme value ? What about its bandwidth? 

The first subject belongs to the hidden periodicities analysis of time series or 
discrete frequency component estimation; and the second one belongs to the spec¬ 
tral density estimation of time series. The first topic has been investigated more 
than half a century and can be classified into canonical hypothesis testing meth¬ 
ods and modern parametric estimating approach which bases on the large samples 
theory. The second topic can also be classified into parametric and non-parametric 
approaches. 


3.3.1 Theory and Methods of Hidden Periodicities Analysis. 


First of all, we shall introduce the hypothesis testing method of Grenander and 
Rosenblatt (1957), for detecting the hidden periodicities in noise data. Suppose 
that the model of observations is that of 

p 

x(t) = A k cos(u k t) + ^(t), (3.117) 

fc = 1 

where P is known, k = 1, 2,P, are unknown parameters, f(t) is i.i.d. 

^(0,a 2 ), and o is unknown parameter. Now, let the sample size N = 2m + 1， and 
put 


= Jjf x i k ) e ~ tkX » 

l 

i P = In ( 等 ) ， 1 < p < m. 


(3.118) 

(3.119) 


then Grenander and Rosenblatt suggested the following testing procedure: 


Ho: x[t) = $(t), 之 = 0, 士 1, 土 2,... 

Hi : P — r (i(i) possesses r frequency components), 1 < r < m. 

(3.120) 

Let /(r)be the rth largest value of {/ p }, put 

9{r) = (3.121) 

^p=l 

then the distribution of the statistic g(r) is 


P{d( r ) > z )= 


(r - 1)! ^ j(m -j)\[j - r)! 
j— r 


(3.122) 
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Shimshoni (1971) gave the percentage points for the distribution (3.122) for r = 
1,2,5,7,10,25,50. 

If the test rejects the Ho, then it shows that x(t) is not a pure white noise series 
and possesses r-frequency components. Suppose that the estimates {/(p)} have 
been arranged in the order 


/(l) > /(2) > ... > /(r), 

and the corresponding frequencies are 

A 2n 4 

Afc = ， k — 1,2,r, (3.123) 

then {Ajk,A; = 1,2, …, r} are estimates of hidden frequencies in x(t), and r is the 
order. 

The amplitudes vs. Ajt can be estimated by 
1 N 

^ k=： k = 1,2,...,r (3.124) 

n=l 

Finally, we want to call attention to the problem on the order r. In the previous 
hypothesis testing, the parameter r is assumed to be known in prior. Since in the 
usual cases r is unknown, so many practical researchers use the testing step by step, 
i.e. they first put r = 1 and make the testing. If H is rejected then they put r = 2 
and make the same testing and so on, until, say r = p + 1, when H is accepted then 
they estimate the order as p. Theoretically, this stepwise testing is unacceptable, 
since the distribution of (3.122) is derived under Ho. When r = 1 is rejected, then 
it means x(t) is not a white noise series, the distribution will have to be changed 
and so (3.122) will no longer be true. 

In recent years another approach for hidden periodicities analysis which is based 
on the theory of large samples has been investigated by a number of statisticians, 
e.g. Wang (1983), He (1984), Chen (1987)，Chen and Xie (1989), Li (1991) etc. In 
the following paragraph, we want to introduce the main results of He (1987) which is 
rather convenient for practical applications. Since the proving is very cumbersome, 
it will be omitted here and readers can find the proofs in the original paper of He, 
if they are interested in mathematical proving of the theorems. 

Definition 3.2 (Fourth Order Stationary). Let f ⑷ be a real time series, which 
is called fourth order stationary, if the following conditions are fulfilled: 


1. < +oo, (3.125) 

2. + rn) = ^e(0)$(rn), (3.126) 

3. E^(t)C{t + rn)e(t + n 2 ) = (3.127) 

4. E^(t)^(t -{- ni)^(t -f n 2 )^(t + n 3 ) = £f(0)^(ni)^(n 2 )f(n 3 ). (3.128) 
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Definition 3.3 (Weak-P series). Suppose that $(i) is a fourth order stationary 
series, if for any ni,n 2 , 


sup I |Q(^i,n,n 2 + n)| i <M< 


(3.129) 


holds, then ^(t) is called a weak - 尸 series, where M is a constant, and 

Q(rai ， n 2 ， n 3 ) =£($(0)f(n l )C(n 2 )^(n 3 )) - ⑼咖 i))^(£(»i 2 )$(n 3 )) 

-- £(f(0)e(n 3 ))£(am)e(n2)). 

(3.130) 


Definition 3.4 (Strong-P Series). If 

oo 

， n 2 ， n 3)| < + 00 ， (3.131) 

Tl i ,fia ,1x3 = — oo 

instead of (3.129), then {(i) is called a strong - 尸 series. 

It is apparent that if is a strong-P series then it must be a weak -尸 series. 
Weak - 尸 and strong-P series widely represent stochastic series appeared in applica¬ 
tion fields. Indeed, at least the following series are strong-P and so weak-P too: 

1. f ⑷ is a Gaussian stationary series. 

2. $(t) is i.i.d. series with fourth order moment. 

3. ^(t) is a linear process, i.e. 


e(t) = ^ Cfc r(«-A:), (3.132) 

k=0 

where (e(t)} are i.i.d. series as (2) and 

\ c k\ < +00 (3.133) 

k 

Accordingly, we know that when £：(^) is as the series (2), then ARMA model, AR 
model and MA model all are strong-P series. 

The following linear model is quite useful in applications. 

Definition 3.5 (Linear Model with Weak-P Residual). Let x(t) be a linear 
model in the form of 

P 

工 (0 = VkC xtXk + 

k= 1 


(3.134) 
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where 

(1) f ⑷ is a weak - 尸 series, the spectral density /^(A) G L 2 [d\)\ 

( 2 ) {r]k)\ are random variables satisfying 

0 < i4 < |r 7 jt| < jB < +oo, a.5., k = 1 ， 2, . • • , P\ (3.135) 

(3) P is a constant, and satisfy 

—丌 < Ai 〈入 2 〈… < 入 / 5 〈丌 . (3.136) 

then we call x(t) a linear model with weak-F residual, or briefly a weak-P model. 

The following theorem is the essential result in the detecting hidden periodicities 
of He (1987). 

Theorem 3.6 (Large Samples Behaviour of Periodogram). Suppose that 
z ⑴ is a weak-P model (3.134), put 

N 2 

J N {\) = tv- 31 / 16 x(k)e- ikx , A G n, (3.137) 

k = l 

then, with probability 1 when N sufficiently large, there exist two constants 
j independent of N 、 such that 

(1) inf {J N (X)} > KiN 1 ^ 6 , j =1,2, ■•- ,P; (3.138) 

lA-Ayl^TT/yV 

(2) sup{J w (A)} < K 2 N-^ 16 , (3.139) 

where 

p 

^ = p| {A : AT 15 " 6 $ |A — 入 y| < 2 tt- N~ 15/l6 }. (3.140) 

j'=i 

Theorem 3.6 reveals the following important behaviour: suppose that z ⑷ is a 
weak-P model, the order P and the hidden frequencies {Ajt} are unknown param¬ 
eters, then for 7 > 0 when 入 € NB(Ay) (neighbourhood of Ay), 1 < j < P, the 
function of Jat(A) > 7 , for N > No, a.s.; conversely, for <5 > 0 , when A ^ NB(A ; ), 
we have J^(A) < 6, for N > Nq 、 a.s.(see Fig. 3.4). 

Accordingly, the order P can be estimated by the total number of the peaks for 
which Jn{X) > 7 when N sufficiently large. 

The mathematical illustration of the preceding discussion is as in the following 
definitions and theorems: 
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Fig. 3.4 Large samples behaviour of J^j (A). 


Definition 3.6 ( 7 -interval). Let {m ( 入 ) } be a series of continuous functions de¬ 
fined on II, and 7 > 0 is given. C IT is the interval with the form 



Ds = (a,/?) C n, or 

Dn = [- 丌， 0 !) U (/?， 丌 1 ， 

(3.141) 

then we call a 

7 -interval if it satisfies the following conditions: 


⑷ 

9nW < 1, for A 6 D n \ 

(3.142) 

(b) 

Qn(o) = Qn[P) = 7 ； 

(3.143) 

(c) 

^ Dn) > w 

(3.144) 

where /x(-) is the Lebesgue measure. 


In Fig. 3.5, D\ = 

- (ai,/?i), Dz = (a 3j /? 3 ) are 7 -intervals, and D 2 = 

=(^ 2 ,^ 2 ) isn’t, 

since 

m(a) < *. 



Definition 3.7 (Order Estimate). Let x(i) be a weak-P model, for ^ > 0 given, 
then the total number Pn of 7 -intervals of is called the order estimate of 

re ⑷. 
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Fig.3.5 D\ is a 7 -interval. 

Then we have the following theorem. 

Theorem 3.7 (Consistency of Order Estimate). Suppose that x(i) is a weak-P 
linear model, is the order estimate, then 

lim Pn = P 、 a. 5 . (3.145) 

N ― >+00 

holds, where P is the real order of x(t). 

Theorem 3.7 shows that such simple estimate of the order suggested by He is a 
strongly consistent estimate. It can be shown that for any 7 > 0 given, when N 
is sufficiently large then, with probability 1 , there must exist P ^-intervals in the 
forms as 

7 = 1,2, ••- ，尸 -1 

^P,N = U (-7r,ai t //). (3.146) 

Let Xj t if be the frequency estimate of Ay, which maximizes the */ 尺（入 ） on the 
interval 

D'、n = \ a j,s ■> 3 = 1 ， 2 ， ." 、 Pn 、 (3.147) 

then we have the following theorem: 
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Theorem 3.8 (Consistency of Frequency Estimates). Let x(^) be a weak-P 
linear model, is the order estimate of a : ⑴， = 1, 2 , • • • , P^} are estimates 
of the frequencies {A^, /c = 1,2, • • • , P}. Then for any constant P < 17/16, we have 

- Ay) = 0, a.s.. (3.148) 

N 


Theorem 3.9 (Consistency of Amplitude Estimates). Under the conditions 
of Theorem 3.8, put 

1 w . 

^j,N = y [zWexp{-:7cXy ， N }， y = l ， 2, … 、 P N , (3.149) 

k=l 

then, for p < 1/16, we have 

lim 7V^(a, V ~ ^j) = 0» a.s. t (3.150) 

N—*+oo 

where {rjj} are random amplitudes of x(t) (see (3.134)). 


Theorem 3.7, 3.8 and 3.9 solve the hidden periodicities problem, and the algo¬ 
rithm is easy to realize. For fixed sample size N (as in usual case in applications) 
how to determine the constant 7 is the crucial-point in He’s theory. Evidently, 
choosing the value 7 too low or too high will lead the algorithm to fail. It is also 
apparent that the selection of 7 should depend on the sample size TV, a considerable 
formula is 

where A, a are constants, e.g. a = 0.5, A = 0.1. 

The weak-P linear model (3.134) includes many models with practical interest, 
e.g. 

P 

x(t) = ^ Ak cos(u k t -H 0k) + ^(0 (3.152) 

k= 1 

is involved in the model (3.134). 

In fact, rewrite (3.152) in the form as 

2 尸 

冲 ） =+ 汴)， (3.153) 

k=l 


where 



k = 1，2,…，尸； 

— k = P + 1 ，_P + 2, • • • , 7.P. 

^Aje i0 \ / = 1 , 2 , - •* 

j = P + l,P + 2,••• 


,2 尸 . 


(3.154) 

(3.155) 
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then (3.153) is in the form as (3.134)，and we can use 2|r；y | and arg(T^) to estimate 
the amplitudes Ak and phases Ok respectively. 

The following Monte-Carlo studies show the performance of He’s algorithm. 

Example 5. Let 

7 

x(t) = +^((), t = 1,2, ••- ,N (3.156) 



where f ⑷ is an MA model 

f ⑷ = Se(t) - 6 e(i 一 1) + e( 卜 2 )， e ⑷〜 m ^(0,1) (3.157) 

and 

7 

SNR = > : = 1.5. 

k=l 

Select 7 = 7.2，then the results of estimated parameters of (3.156) are listed in 
Table 3.1 for N = 50, 100. 


Table 3.1 


parameter 

real 

estimated 
(N = 50) 

estimated 
[N = 100) 

P 

7 

6 

7 

久 l 

-2.76 

-2.764 

-2.764 

久 2 

-2.12 

-2.073 

-2.104 

入 S 

-1.76 

-1.759 

-1.759 

入 4 

-1.00 

-1.005 

-1.005 

As 

1.724 

1.696 

1.727 

入 6 

2.000 

* * • 

2.010 

入 7 

2.35 

2.332 

2.330 

ai 

5.1 

5.733 

5.403 

0-2 

6.7 

0.775 

3.795 

as 

-4.8 

5.009 

-3.895 

dA 

-3.7 

-3.619 

-4.02 

as 

3.9 

2.649 

3.126 

as 

3.0 

« * * 

1.87 

ar 

-4.23 

-3.723 

-3.283 


Example 6. Let x(t) be the linear model 

6 

x(t) = oc k e~ itXk + 训 , 

k=l 


(3.158) 
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where is an ARMA(2,2) model 

3 《⑷ + 3.5C(t 一 1) + 扑 -2) = 32e ⑷一 24e{t 一 1) + 4e(t - 2), 
e(t) 〜 \A.d.N(0, 1), 

and SNR = 0.463, 7 = 9.6. 

The estimated results for N = 50, 100 are listed in Table 3.2. 


Table 3.2 


parameter 

real 

estimated 
IN = 50) 

estimated 
[N = 100) 

P 

6 

6 

6 

Ai 

-3.100 

-3.141 

-3.110 

^2 

-2.070 

一 2.074 

-2.073 

入 3 

-0.800 

-0.753 

-0.816 

入4 

0.300 

0.314 

0.314 

As 

1.290 

1.256 

1.288 

<^0 

2.300 

2.324 

2.293 

Ol 

-4.80 

-1.64 

-4.00 


-3.900 

-3.669 

-4.200 

as 

4.200 

4.557 

3.862 


3.700 

4.395 

3.377 

05 

-4.230 

-2.938 

-4.869 

ae 

3.920 

2.120 

3.272 


There are many other hidden periodicities analysis methods that can be found in 
Pisarenko (1973)，Priestley (1981), Chen and Xie (1989), Li and Xie (1991), etc. 


3.3.2 Theory and Methods of Spectral Density Estimations. 

In this section, we want to introduce the theory and methods of spectral esti¬ 
mation, which can be classified into parametric and non-parametric approaches. 
The former is to estimate the parameters contained in the model or in the spectral 
density function and the latter is to introduce some statistics, say periodogram, for 
estimating the spectral density directly without assuming any parametric model. 

Systematric theory and methods on parametric spectral estimation have been 
illustrated in the book of Dzhaparidze (1986). The readers can find some interesting 
results there. 

Now, we want to introduce a theorem on the AR spectral approximation which 
shows that under some mathematical conditions the AR spectral estimate is a uni¬ 
formly consistent estimate for the spectrum of stationary time series. 
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Theorem 3.10 (Long order AR Approximation) . Suppose that x(t) is a Gaus¬ 
sian stationary series*, the Wold decomposition of x(t) (see (2.84)) is 


^(t) = ^ c k e(t - k), 


k=0 


and satisfies the following conditions: 

(°) ^|c*| < +oo, 

k 

(fc) ^C fc 2 fc ^ 0, \z\ < 1. 


(3.159) 

(3.160) 

(3.161) 


Let P(N) be a series of positive integers which tend to +oo increasingly with the 
order of 

P(N) = o( : „、，. c ), for 6>0. 


then 


JogiV(loglog N) 叫， 


sup |/n(A) - /z(A)| = 0(1 )， 
Aen 


where / Z (A) is the spectral density of x(^), and 


/n(^) 


2 tt , PW ,2* 

+ E 

k= 1 


入 e n 


(3.162) 

(3.163) 

(3.164) 


with the parameters ((4 尸 （〜）) 2 ,… satisfying the Yule-Walker equa¬ 
tion (see (2.141)): 

R PW+l (l,< / >[ N) , ■■- = ((^ P(JV)) ) 2 ,0,--- ,0) T . (3.165) 


In practical application, one can select the order P(N) as 

P(N) = KN 0 - 3 


(3.166) 


where 2 < A" < 5 is a considerable interval. 

The following theorem offers a direct method for estimating the ARMA spectrum. 


•In the original paper of An, Chen and Hannan (1982), the result of Theorem 3.10 is free from 
Gaussian distribution. Here, we only want to make a briefly statement on their results. 
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Theorem 3.11 (ARMA Spectrum Representation). Let rj(t) be an ARMA 
series, the model equation is 

P «7 

小邮 - k ) = — s ). 

A;=0 a=0 

Put 

y(t) = ^0 a e(t-s) (3.167) 

a=0 

and denote its covariance function as {R y (k) i k = 0,1,2, ••- }, then the spectral 
function of rj(t) can be represented as 


/”( 入） = 


■R y (0) + 2 i? y (/c) cos (A: A) 

_fc=i_ 

2 tt Y2 4>ke~ ikx 

k=0 


(3.168) 


Proof. By (3.167) we known that the spectral function of y(i) is 


副 =^ E Ry(^~ ikX 

k=—q 

= 士 (丑 y ⑼ + 2 Ry(k) cos(A: 入 )). 

Substitute (3.169) into the numerator of the spectrum of rj(t), we have 


(3.169) 


/, ㈧ 




fc=o 


2 tt 


-ikX 


k=Q 

<7 

i?y(0) + 2 /2 y (A:) cos(A:A) 


2 tt 


k=0 


(3.170) 


That ends the proof of the theorem. | 



In practical applications, R y (k) can be estimated by 
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p p 

ly(n) = ^k^sln-k+s, n = 0, 1 ? ••- (3.171) 

/p —0 3 — 0 

(see (2.168)). 

Other parametric estimation of spectral analysis can be found in the book of 
Dzhaparizde (1986). 

Now, we want to introduce the theory and methods of spectral estimation by 
non-parametric approach. 

Definition 3.8 (Periodogram). Let i(l), x(2), …， x(N) be observed samples 
of a second order stochastic series, then we call 

In ( 幻 =▲ E 工 (f 2 ，入 e IT (3.172) 

k = \ 

the periodogram of {x(/c), k = 1 ， 2, • •. ， N}. 

Evidently, we can rewrite (3.172) as 

N-l 

/n(A)=— (3.173) 

71 k=l-N 

where 

N-k 

Ik = N~ l ^(-s)x(s + A:), 

a= 1 

1-k = Ik, /c = 0 ， 1， …， _/V - 1 ， （ 3.174) 

are sample covariances. 

Since, under some mild conditions (see (1.128)), is a consistent estimate of 

covariance R[Jc), therefore one can expect that J" ( 入 ） is a reasonable estimate of 
/ Z (A). Indeed, the following theorem shows that ( 入 ） is an unbiased estimate of 

/x(A). 

Theorem 3.11. Suppose that i(i) is a stationary series with covariance function 
R(k) satisfying 

+ oo 

[ \ R i k )\ < +°° 

fc= —oo 
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then the following equality 

lim E{/^(A)} = / Z (A), \en 
yv —»+oo 

holds, where / Z (A) is the spectral density function of x(i). 
Proof. By (3.173), we have 


-xkX 


E{/n(A)} = ^ ^2 E{7*}e 


ikX 


2tt 


k=l-N 


2n 


Y, R N {k)e~ ik \ A G n 


where is defined as 


R N {k) 


耍)尋 


|fc| = 0,1,2 ,••• ,N-1, 
otherwise. 


However, Rs{^) possesses the following properties: 

1. lim Rjv(^) = R(k) } for any integer 
N—»+oo 

2- \R N (k)\ < \R(k)\, ^2\R(k)\<+cc. 


Then we have 


?u 

尋 - ifcA 


N lim oo E{/ N (A)} = & ㈨ 卜 

k 、 


-ik\ 


2 沉 k 

/x(A), 


(3.175) 


(3.176) 

(3.177) 

(3.178) 

(3.179) 


(3.180) 


where the last equality of (3.180) is derived from the Wiener-Khinchin theorem (see 
(1.57) ， (1.58)). I 


Even though ( 入 ） is an unbiased estimate of the spectral density / X (A) but 
the following theorem shows that the periodogram in general is not a consistent 
estimate. 
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Theorem 3.12 (Covariance of Periodogram). Suppose that x(t) is a Gaussian 
stationary series, its derivative of spectral density is continuous on IT, then for 入， 
M € n. 


Cov{/^(A), /yv(/i)} 
/xtA)/,^) f/sinf(A +At )\ 
^ H sin ^ ) 


sinf(A-/i)\ 2， l /log^\ 

J r k 丁 Vi 叫 


holds. 


The proof of this theorem can be found in the book of Rosenblatt (1974). 

Under the conditions of Theorem 3.12, the readers can easily obtain the following 
conclusions: 


/ X 2 (A), A^0,±^, 

2/1(0), <\=0 ， ±7T. 


⑴ lim Var{/ lV (A)} 

N—*4-oo 

(2) lim Cov{//v(A),/yv (m)} = 0 , when |A ± /z| / 0, ±n. 

N—» + oo 


(3.182) 


(3.183) 


Combine the results of Theorem 3.11 with that of Theorem 3.12, we have 

E{/n(A)- / x (A)} 2 = Var{/ N (A)} + (E{/^(A)} - / X (A)) 2 

7 ^ 0, if / X (A) #0 ， A 6 n. (3.184) 

which shows that /^(A) does not converge to spectral density in L 2 since Var(/// ( 入） 
7 ^ 0, iV —> +oo, for A G n. 

In fact, the following example shows that even in the probability sense In(X) may 
not converge to /(A). Suppose that x(l), x(2), ... ， x(N) are i.i.d. N(0 ， 1) random 
variables, define a modified periodogram as in the following 


/n(A) 


S，— ikX 


= I^n(A)| 2 , 


(3.185) 


where 


-ikX 


^/v(A) = -j= ^x(/c)e _, 

Now, for 入 = 0, d"(0) = Ylk=i 工 ( 灸 ) is a normal variable with 

E{d/v( 0 )} = 0 

1 N 

Var{d/v(0)} = — ^ Var{x(A)} = 1, 


(3.186) 



106 


thus 1^(0) is a x\ random variable for any positive integer N. 

Therefore, for any 6 > 0, 

P{\In(0) - 1| > 6} 0, N -foo, (3.187) 

where, 2nf z (0) = 1, hence ⑼/ > / z (0) = ~ in probability sense. 

The way for improving the statistical property of the periodogram is based on the 
result of (3.183), which shows that for different values of A, /x, when |A 士 _ 0, 士 tt, 
they are asymptotically uncorrelated variables. Since we know that the summation 
of uncorrelated random variables usually may reduce the variance, some researchers 
suggested a weighted averaging periodogram for the spectral estimates which can 
be represented in the following form: 

/n(A) = [ W N (fi)I N (X - n) dfi, (3.188) 

•/n 

where {W"(*X)} is called spectral window function. Equivalently» one can represent 
such estimate in time domain as 

/n(A) = ▲ (3.189) 

where {u^j(k)} is the Fourier series of {W^(A)}, and {^k} is the sample covariance 

N-k 

= N_ l x(s)x(5 + fc), fc = 0,1,2, * • • , iV — 1. (3.190) 

A=1 

A number of very famous window functions have been suggested by researchers 
in the past three decades, and have developed systematic theory on the selecting 
optimum window functions, readers may find very interesting results in the books 
of Brillinger (1981)，Priestley (1981)，Rosenblatt (1985), etc. 

In short, the window coefficients {w；v ( 左 ) } can be determined in the following way: 

1. Find a kernel function K(t) which is a real even function, its derivative exists 
almost everywhere on [ —1, l] and satisfies 

/f(0) = 1, and K(t) = 0, \t\ > 1. (3.191) 

2. Select a series of positive numbers {m^/} such that 

a. ms —* +oo, 

b. —► 0, N —► 十 oo. 
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3. Put 


we get 


占 =o ， ±i ， …’ 土 ㈣ ， 


= H 叫 ㈨ e -‘* A ， xeu. 

k= — mn 


(3.192) 


(3.193) 


For example, let the kernel function K[t) be the following rectangular function 

邱卜 { 1; . (3.194) 

L 0, otherwise, 

then the window coefficients, window function and the spectral estimate can be 
represented respectively in the following way: 




wma)= 


/;v ( 入 ) = 

1 



'sin(mw + !) 入、 


a g n. 


(3.195) 

(3.196) 

(3.197) 


k= — mff 


Some of the window function which possess good performances are introduced in 
the following: 


1. Bartlett window. 


邱） 一叫 141 - 1： . 

t 0, otherwise. 


^(A) 


2nm N \ sin I 


(3.198) 


2. Hamming window. 

K(0 


).54 + 0.46cos 付， \t\ < 1; 

〕， otherwise. 


(3.199) 


W N {\) 


'o.54 S j n(mjV + ^ )A + 0, 2 3f Sin ^t^ ) l\ + f ^ ) 


sin (入 + ^ r )/ 2 


sin(m；v + f)(A - ^-) ’ 
sin ( 入 - S)/ 2 


(3.200) 
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Theorem 3. (Consisitency of Spectral Estimate). Suppose that x[t) is Gaus¬ 
sian stationary series, its derivative is continuous on FI. VT^(A) is the window func¬ 
tion as mentioned above, then the spectral estimate (3.189) keeps the following 
results true: 

1. lim Ef N (X) = /z(A). 
n—*oo 

2. Var (/ n (A)) - ^/ X 1 2 3 (A) J W^[x)dx, A/0,±7r; 

4 7T f 

Var (/ n (A)) - ^/*( A ) J W n{ x ) dx ^ A = 0, 土 7 T. 

Corollary. Under the conditions of Theorem 3, the following asymptotic results 
hold for sufficiently large samples: 

Var (/ n (A)) ~^/ i 2 (A)||K|| j , A # 0，:Br ; 

Var(/ N (A)) ^^Lfl(X)\\K\\\ A = 0,±tt. (3.203) 

where ||^|| 2 = K(t) 2 dt. 

Based on the formula (3.203), one can easily see that the asymptotic relative 
error of the estimate is (A ^ 0, 士 7r) 

5 2 = 〜 ^fll^ll 2 - (3-204) 

Hence, for fixed sample size N and m^, 6 2 is determined by the kernel energy 
||K"|| 2 . Several such values are listed as in the following: 

1. Rectangular. ||/f|| 2 = 2. 

2. Bartlett. \\K\\ 2 = 2/3. 

3. Hamming. ||/ir || 2 = 0.7948. 



4. Parzen II. ||/C|| 2 = 0.5392. 
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Cheng and Xie (1982) suggested a window function, which is derived from the 
criterion of maximum energy ratio in frequency domain, is as follows: 




r o .： 

: u 


=0.502. 


313 -f 0.531 cos nt + 0.156 cos 2nt, 


N < i; 

otherwise. 


which shows that such kernel function has a small value of ||X|| 2 , and the represen¬ 
tation of the window function is much simpler than that of the Parzen II. 

According to the preceding results, the practical calculation for spectral density 
estimate for observed data can be summarized as in the following: 

1. According to the necessity of the practical problem, the total number of 
frequencies distributed on [0, 丌 ] has to be given before the calculation, and will 

determine the resolution of the spectral estimates. Then, by (3.204), we have the 
asymptotic formula for large samples 

„ m N ||^|| 2 

■ - 


(3.205) 


where 8 2 N = Var(/^) / /^, which shows us how many samples are approximately 
needed, where the error percentage 6^j can be chosen as 0.1-0.3. It is apparent that 
when we want better resolution (large m^) then more sample size N is required. 
Similarly, more accuracy means more samples are needed. 

2. When N is determined, {;r(l) ， 1(2) ， … , x(N)} are samples, then the sample 
covariance can be calculated by the following formula: 

y ⑷ =i(/c) - x; 

N 

x =N~ X :⑷； 
k=l 
N-k 

iW =iv ■ 一 1 + «)y(^)» ^ = o,l,... ,m N . 

a—l 

3. Spectral density estimates on [0, 丌 j are 

fk = 十 X^y(*s)u; N (s)cos ( 盖 ))’ k = 0,l,... ,m N . (3.206) 


where {o^(fc)} are the window coefficients. 

Finally, we want to point out, if the original record of the observation {y(i)} 
is a continuous curve, then a digitized procedure which based on the Shannon’s 
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sampling theorem (see (1.147)) is necessary before operating the calculations men¬ 
tioned above. That is: suppose that W is the upper bound of the frequency of the 
observation, then put the sampling as 


~2W' 

x(k) =y(/cA), 


A; = 1,2,, N. 


(3.207) 


Now, the estimated frequencies are distributed on [0,2 丌 1^] rather than on 
[0,7r], and the estimated amplitude of y(t) is now 


fyW 


2W 


fki ^ = 1 » 2 ,... , ttin• 


(3.208) 



PART TWO 


CASE STUDIES IN 
TIME SERIES ANALYSIS 
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CASE I 

Digital Processing of a Dynamic Marine Gravity Meter 


1. Problem Statement and Working Diagram of a Dynamic Marine 

Gravity Meter 

Magnetic prospecting, seismic exploration and gravity prospecting are very often 
used methods in geophysical exploration. As we know, different values of the grav¬ 
ity on the earth may be induced by different latitudes and stratum structure of the 
earth. Gravity prospecting is to measure the variation of the gravity acceleration 
at each observation point by physics instruments (in general, the gravity value is 
approximately equal 980cm/sec 2 ). The fluctuation of the gravity values at different 
places in prospecting field usually offers very useful information about the under¬ 
ground resources, such as oil, iron ore, etc.. Since the gravity exploration method 
does not cause demolition, therefore it is more convenient for keeping good ecology 
environment for the sea. 

There are two kinds of gravity meters used for prospecting, one is in static state 
and the other in dynamic fashion. In China, in the 1970’s, the geological explorers 
used only the Canadian static state gravity meters. For measuring a gravity value 
on the sea, such instrument needs to be put into the sea with the boat’s navigation 
stopping. Evidently, such operation heavily restricts the working speed of the sea 
prospecting. 

Scientists suggested another way for the same task, that is to operate the pros¬ 
pecting in a dynamic state. Their suggestion is to install a scientific equipment on 
the observation boat with dynamic working, so that the measured gravity values can 
be continuously obtained when the boat is navigating on the sea. However, one of 
the most troublesome problem is: when the gravity meter is operating in the bad 
weather, such as a gale and, in the usual case, the boat is tossed up and down 
strongly by the waves, then it may produce a very strong acceleration and inter¬ 
fered with the real gravity measuring since we know that the nature of the gravity is 
accelerating too. Compare with the fluctuation signal of the real gravity, the magni¬ 
tude of the interference acceleration may be greater than the signal approximately 
100,000-50,000 times. One naturally wants to know whether it is possible to detect 



114 


an extremely weak signal in a very strong noise background. Some geologists and 
engineers were pessimistic, for they said even putting the whole equipment on a gy¬ 
roscope platform the vertical acceleration component produced by the bouncing of 
the boat will still exist. Then, a challenging problem is asked by the engineers, that 
is, is it possible to solve such a difficult problem mainly by mathematical methods? 

The components and the working diagram suggested by the engineers are as in 
Fig. 1.1. 


positive feedback 


output s (t) 


aero-gyroscope 

p 1 atf or■ 

…… 施…」 

I fac i 1 i.t ies for ' 
ir educing bu ip i ng! 


sensitive 


se 1 f-exc i 11ng 

equ i paent 


anp 1 i fier 


a 1ternative 
digital 
freq uency 


Fig. 1.1 Diagram of dynamic marine meter ZY-1 

The physical principle of the sensitive equipment component in Fig.I.l may be 
briefly illustrated as follows (see Fig. 1.2). 

A copper block is hung by a metal wire and as soon as it vibrates in a magnetic 
field then electricity current produces, and the vibration frequency will depend on 
the gravity value g. Roughly, we will have g = k / * 2 , where / is the frequency, k the 
constant which will be determined by some physics factors such as the variety of 
the metal block, the vacuum level in the equipment, etc.. 


2. The First Test for Solving the Problem 

In many practical problems the first step for solving is to describe the working 
system or procedure in mathematical formulas or models. Of course there are usu¬ 
ally many many factors related to the problem, and we could not expect that all 
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Fig. 1.2 The simplified structure of the sensitive equip- 
ment 

of the factors can be included in the model or formulas. We can only take into ac¬ 
count some of the key factors and the main working procedure for the mathematical 
formalization. 

In the signal processing problem mentioned above (see Fig. 1.1)，the following 
factors are generally considered: 

a. The vertical acceleration caused by the boat’s bouncing, that seriously inter¬ 
feres the correct measuring for gravity values. 

b. The long-period systematic measuring shifting of the aero-gyroscop. 

c. Geological rectifications, such as those at different latitude, etc.. 

d. The residual horizontal acceleration and the rectification of the instant fre¬ 
quency measuring (see Section 4). 

Since the systematic measuring shifting of the gyroscop showed good character¬ 
istics and the geological rectifications is also easy to do by some ready methods in 
geophysics, so we shall only discuss how to tackle a. and d. mentioned above. 

Mathematically, we denote the interference acceleration as n ⑷， the gravity signal 
as and the recorded observation is y(^). Now, for the purpose of the formal¬ 
ization in mathematics it is necessary to know: what is the relationship between 
them? Additive model or multiplicative model, i.e. 


or 


y{t) = g(t) + n[t) 


( 1 . 1 ) 
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y(0 = g(t)n(t)? 


( 1 . 2 ) 


Usually in such model selection problem is not easy to be answered by the math¬ 
ematicians themselves, and discussion with engineers and scientists may often help. 
In our situations, after discussed with geologists and engineers, they confirmed that 
the acting of the interference n(i) can be considered as an additive noise. 

Next step for our problem is to make some statistical assumptions for the noise 
n ⑷ and the signal g(t). According to the practical record which had been obtained 
by the Canadian gravity meters on a working line in the Eastern Sea of China, the 
stationarity of the process had been examined, and found to be true, so we assumed 
that the signal process g ⑴ is a stationary process. As to the noise n ⑷， since the 
working time for each observation point required in practice is only limited to 10-20 
minutes, so it can be considered that the weather and the sea state in such a short 
period is also in a comparatively stationary situation. Therefore, we assumed that 
the noise n ⑷ is a stationary process independent of the signal g(t). 

Now, our problem can be classified as a filtering problem, i.e., suppose that a 
segment of the observation y(i), t £ T is known and we want to obtain an estimation 
for the signal g{t) as good as possible. 

However, another problem on the working pattern of the filtering is also very 
important, that is: the filter is designated as “causal” or “noncausal” pattern? 

Mathematically, such difference can be illustrated as in the following: 

Let {rjt, t 6 Z} and {ft, ^ ^ Z} be two stationary correlated process (see Section 
2, Chapter 3)，where the rj t is the observation series, the signal series. Now, we 
restrict ourselves to the linear filtering for simplification and put 

Hr, =£{T7 t ,teZ}, (1.3) 

Hr,{t) ^£{ri„s<t}, (1.4) 

for representing two linear closed manifolds generated by rj[ Evidently, we have 

Hr,{t) C 

If the optimum solution is sought in H ni then that means the filtering is working 
on “noncausal” way, since for realizing such filtering, the whole line observation 
i.e. all the data observed from the past to future are needed theoretically. If the 
solution of the problem is obtained in if” ⑴， then the filtering is causal, since the 
data only up to the current observation are needed. 

The first type of filtering for stationary series, theory and methods can be found 
in Theorem 3.4 in Chapter 3, which will be quoted briefly as in the following: 

Let / n ^(A) and ( 入 ） be the co-spectrum and auto-spectrum of ft and rjt re¬ 
spectively, then the optimum filtering for ^ based on the whole line observation 
is 

(t = Proj{6} = / teZ (1.5) 

H„ Jn JrtriW 
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where dZ v is the orthogonal stochastic measure of rjt (see (1.109)). Now, suppose 
that frjrj(^) can be represented as a Fourier series 

T, W' A en, (1.6) 

in L 2 [dFf])^ then we have the linear form of the filtering (1.5) 

oo 

It = 一 k, t e Z. (1.7) 

k = — oo 

The second filtering version based on the data of semi-line {rj 3i s < t} is of the 
form 



where 


= [ e» A dX, 5 = 0, 1 , 2 ,... (1.9) 

Jn r^e-^) 

and 1\(2) is the maximum analytic functions of rj t (see Theorem 3.5). In ARMA 
model for rj “ r r7 (z) can be obtained by 

rn(-) = ||^. 1-1 < 1. (1-10) 

(see Rozanov (1967)). 

Compare these two results, evidently, the second version is closer to the real 
working procedure, i.e. on-line pattern, but the problem as one of the filtering is 
much complicated for practical realization since it is not easy to obtain, in practice, 
the maximum analytic function [z) and the coefficients 2 0} if we are to 

start from a segment of observations. 

The first filtering version was recommended by our research group but only one 
problem needs to be solved, that is: how to give a reasonable treatment on the 
contradictory cases when “the whole line data is assumed to be known in prior” vs. 
“semi-line of observations are known in practice” in our situation? 

As an approximation, we assumed that the decreasing speed of the coefficients of 
the filter {/ifc, A: 6 Z} in (1.7) is fast enough so that the filtering formula (1.7) can 
be rewritten as 

M 

it 〜 ^ ^kVt-k 

-M 


(I.ll) 
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which means that for segment of data {rjt 1 t 0 - M < t < t 0 + M}, the filtering 
output is just on the middle point. In other words, if such filtering is carried out 
for the segment of data {rjtyt < !T}, then the output is 

M 

Ir-M 〜 L ^kVT-M-k (1.12) 

k=^M 

i.e. we have an M-step delayed output with respect to the current time T (see Fig. 
1.3). 


Ob ser va tion 



T 一 2M T-M 


Fig. 1.3 Output of an noncausal filter 


In our practical problem, such delay took approximately 5-10 minutes on the real 
geophysical prospecting on the sea and was fortunately accepted by the geologists 
and engineers. 

Recall the discussion of the Corollary in Theorem 3.4, we know that if the spectra 
of the signal and the noise are non-overlapping, i.e. 

fgg(^) fnn(^) = 0， A 6 IT 
fgg ( 入 ) ^ 0, xe Acn 

(see (3.76)-(3.82)), then the output 

9t = [ e tAt < / > 0 (A) dZg(X) = g t 
J\\ 


( 1 . 13 ) 


(1.14) 


holds. 



/nn ⑷ 
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Fig. 1.4 Spectrum of the noise in Eastern Sea of China 


According to the real record of the fluctuation signal g(t) of the gravity which 
is obtained from the Eastern Sea of China we know that the spectrum of such 
signal concentrates in the very slow frequency domain and the noise is located at 
comparatively high frequency domain (see Fig. 1.4). 

In the first stage of our research, conditions of (1.11) and (1.13) were assumed to 
be satisfied and that led to the following optimum filtering 


M 

9t = ^2 h kVt-k 

k=—M 

where {/i^} are Fourier coefficients of the OCFF (3.80) (in Chapter 3) 

刚= 

The support set A of (1.15) can be considered as 

A = {A : 0 < A < a}, 


f 1, AG A; 
\ 0, \ ^ A. 


(1.15) 


(1.16) 


where a is the upper bound frequency of the signal g(t). 
estimated as 

2tt 


38 


From the real data a is 
(1.17) 
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and coefficients of the filter {/i^} are 

sin kn , 

h，k = -:—， /c = 0, 土1，士2, • • • . (1.18) 

nk 

The original record of the gravity is a continuous process, and digitized with an 
interval A (see sampling Theorem 1.10，Chapter 1), then the filtration output is 
represented as 


9t 


E 


k= — M 


sin kaA 
kwA Vt ~ k, 


t = 0, 士 1, 士 2,.... 


(1.19) 


As a preliminary trial for the whole system, we carried out the experiment in 
Miyun reservoir in Beijing for 3 months. Unfortunately, this experiment failed, 
because our measurements deviated 100-1000 times from the true gravity values! In 
fact, there were many problems which led to the failure, such as the vacuum level in 
the sensitive equipment decreased rapidly, the whole system was installed too close 
to the engine of the boat, etc.. However, one of the most important problems is the 
“optimum” CFF (1.18) which we selected does not possess a good performance on 
reducing the strong noise in frequency domain when the data size is not big enough. 

Analytically, we can rewrite the CFF of (1.18) as a sum of finite terms 


释 £ 雙… 

k = -N 


( 1 . 20 ) 


and ifo (入 )_ 0, for a < X < n/ A so that some “energy” of the strong noise interferes 
the filtration, for the signal, that is what we called “lackagc” problem in engineering. 
In fact ， not only the CFF of 7/ 0 ( 入） is not good enough in [a, 7r], nor even in [0, a), 
we also could not obtain a straight line as H(X) since there are only finite number 
terms in (1.20).* 


3. Design a New Digital Filter under Min-Max Criterion 

According to the preliminary experiment, several points for improving the CFF 
of a new filter are important : 

*In practical applications, if the process is digitized with an interval then the CFF may be repre¬ 
sented as 

H" ( 入 ） = 〉: hjc cos AA:A, , 0 < A < 7r/A, 
k 

and we assume △ = 1 in the following analysis. 
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(1) . Design a new filter with finite number of terms directly. 

(2) . The signal g t is an unknown stochastic process, the pattern of its spectrum 
is unknown also, but the energy is located mainly in a small interval [0, a), where 
a is the upper bound frequency of the signal. 

(3) . The energy of the noise n ⑷ is distributed on a comparatively high frequency 
interval [a,7rj, and the CFF of the new filter should reduce the noise as much as 
possible on {A : a < A}. 

Therefore, we can formulate the mathematical problem for designing a new CFF 
as in the following: 

a. Let {hki k = 0, 土 1 ， •• • ，士 TV} be real coefficients of the filter, which satisfy 

hk = = 0,1,2 ,N (1.21) 

b. The CFF of the filter, denoted by //(A), is of the form 

N 

H(\) = hk cos k\ (0 < A < 7r), (1.22) 

k--N 

and 

H[0) = 1 (1.23) 

for normalization. 

c. For truncated value a > 0 and error 6 > 0 given, we want to find an optimum 

CFF (denoted as OCFF) such that 

Max |/T(A)|= inf { Max \H(X)\}<6, (1.24) 

ot<X<ir {/i fc } a<A<7r 

where the infimum runs over all {h^} which satisfy conditions of (I.2l)-(1.23). 

In (1.24) , we cannot expect to design a filter for 5 = 0, since, otherwise it would 
lead to the following result 

6 > Max^|//*(A)| > |i/*(A)| = 0, a < A < tt. (1.25) 

However, I J2k=-N cos 允入 I 三 0, o ： $ 入 < 丌 ， concludes to a trivial case 

/ij = 0, k = 0, 士 1, ... ， 士 N. 

In order to find the OCFF for a given 6 > 0, put 
N 

M = {^(A) : H{\) = h k cosk\,h k =^_ t (real),|/f(A)| < 1 ,q < A < tt}. 

k=-N 

(1.26) 
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Evidently, if a function fl ^ M which keeps 

I 片 (0) 卜 ⑼ I (1.27) 

true for any H G "H 、 then c > 0. 

Indeed, let h 0 = hk = 0, k = 0, then H(X) = 1 G )/. Since (1.27) is true, so 


\H(0)\ = c> |/f(0)| = 1 >0. (1.28) 


The following three theorems solve our mathematical problem thoroughly.* 


Theorem 1.1. Suppose that {hk, k = 0, 土 1，…，士 TV} satisfy the condition (1.21) so 
that /r(A) G M and keep 

\H{0)\=c>\H(0)\ (1.29) 

true, where H is any one of the elements in . Put 

= ~hk, k = 0, 土 1,...， 土 iV (1.30) 

then {hk} is the optimum CFF of (1.23) and (1.24) with 



(1.31) 


Proof. Let 


then 


and 


N 

H[\) — hk cos A:A, 0 < A < 7r, 

k=-N 

l^(°)l = I 込 l/c = l 片⑼ |/c = 1， 

k=-N 

N 

I 泠 ( 久 )| = I hk C0S ^^I ^ 1， 0 < a < A < 7r, 

k=-N 


since i^(A) G M. Hence , we have 


N N 

|^(A)| = I cos A: AI = -| hk cos k\\ 

k=-N C k=-N 

1 

< 

c 


(I_32) 

(1.33) 

(1.34) 

(1.35) 


*The original proofs of these theorems were given by Professor Ci-he MIN in 1970 in our research 
group. 
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Now, put D = Max Q <A< 7 r |^(A)|, then D < ^ holds and satisfies the following 
inequality 

inf { Maj |/f(A)|} = 6<D<-. (1.36) 

{/ifc} at<A<rr C 

Suppose that coefficients of OCFF which satisfy (I.23)-(I.24) are {h* k }. On putting 

hk = A; = 0, 土 1， …，士 TV, (1.37) 

6 

then 

N N 

|if(A)| = I hk cos kX\ = - 1 h* k cos A:A| 

k = -N k=-N 

1 N 

< - Max I h* k cos A;A| = 1, (1.38) 

6 a<A<»r ^ 

一一 k = -N 

which shows that H E ^. 

According to the conditions of this theorem (see (1.29)) 

\H(0)\ < |ff(0)| =c, (1.39) 

and 

|^(0)| = |/T (0)1/5= 1/5 (1.40) 

by (1.37) and (1.23). 

Connecting the results of (1.39), (1.40) and (1.36), we have 



6 > 1/c 


and 

6 < 1/c, 

(141) 

thus 

6 = l/c = D = Max |/r(A)|. 

c«<A<jr 

(1.42) 


By Theorem LI we know that the problem for seeking the OCFF is changed into 
finding a function H E M which satisfies the condition (1.29). 

For solving such problem, we need to make a functional transformation for H( 入 ). 
As a matter of fact, we may consider the function cos(A:y) as a polynomial function 
of cos(y) with order |A:|, and this can be seen as in the following way 

cos(2y) = 2cos 2 (y) - 1, 
cos(3y) = 4 cos 3 (y) — 3cos(y), 
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etc.. 

For any positive integer M, the relationship between cos(My) and cos(y) can be 
derived from 

M 

2 m cos M (y) = y^ c a M cos(2s - M)y 

a=0 

by recursion. 

Now , consider 丑(入） as a polynomial of cos 入 with order N and put 



N 

Pn (cos A) = ^ hk cos k\ % 
k=-N 

x — cos A, 0 < a $ A S 7T， 

(1.43) 

then we have 

—1 < x < cos a = a < 1, 

(1.44) 

and 

H(0) = P n (1). 

(1.45) 

Similar to the )/, 

we may also define 



P = {P^(cos A) : a real polynomial of order TV, |P//(cos A)| <l,0<a<A<7r} 

_ (1.46) 

then Theorem 1.1 can be changed into the following statement: 

Theorem 1.2. Let Pn(x) G P and keep 

I 心⑴ |2lMl)l (147) 

true, where Pn is anyone of the elements in 尸， then P^v(cosA) is the OCFF with 

|^(x)|<l, -\<X<a (1.48) 


Proof. This theorem is just the result of Theorem 1.1 with the statement by the 
polynomial of cos 入 rather than by H(\) which is a linear combination of coskX. | 

Theorem 1.3. For any a, 0 < a < 7r, and an even positive number N, put 

Pn(^) — cos (TV cos -1 — ~ : : - l -) ， — 1 < x < a = cos a (1.49) 

then Pn{x) satisfies the conditions of (1.47) and (1.48). 
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Proof. First, by the well-known result in theory of functions (say ， see Theodore 
J.Rivlin(1990)), we know that the Chebyshev polynomial T^v(y), of order iV, can 
be represented as 


T N (y) = cos(iVcos _1 (y))， 
=cos(TV^) 

[N/21 


<y < 1 , 


m\ 剛 /A rx / A \ 

5 卜广 5( ⑽…. (-0) 


For — 1 < x < a, we have 


-i< 2z ~ a , + 1 <i, 


thus (1.49) is a polynomial with order N 、 and keeps 

|Av(:)|£ 1, -1 < x < a, 


true. 


Secondly, we need to prove that for any polynomial Pn( x ) ^ ^ 

|Fn(i)| > |Pn(i)| 


holds. 

In fact, put 


= ? [(a 十 1) cos ^ + (a - 1)], m = 0, 1，2, …， iV， 


or equivalently, 


cqs m, = 2x m _- m = 0l X,2,...,iV, 
N a + 1 


then by I cos y I < 1， we have 

-1 < < a, m = 0,1 , 2 ,..., AT 

and 


(I.51a) 

(I.51b) 

(1.52) 

(1.53) 

(1.54) 

(1.55) 

(1.56) 


x m < x n (when n < m). 

Another important fact is : any polynomial P^v(x) with order N can be uniquely 
determined by TV + 1 values of P^(x m ), m = 0,l，2，".，iV. Indeed, by Lagrange 
interpolation formula, we have 


n n ( x _ x n) 

Pn[X) = 忘 ， " (Zm) n - x n ) 

n^m 


(1.57) 
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then 

N 11 (! — x n) 

L l^iv(x m )| ^ (1-58) 

m _0 n^m 

In (1.58), (1 — x n ) > 0, so 

n >0, (1.59) 

n^m 

and 

Y[ ( X m - :n) = ( Irn - Zfl ) II ( Zrn - In ). (I- 60 ) 

n^m n<m n>m 

Since {x n } are decreasing numbers (see (1.56)), so the sign of (1.60) whether it is 
positive may be determined by the first term in (1.60), i.e. 

II (^rn - In) = (-l) m ( X m - Xn)- 

n^m n^t-m 

Joining G P with the results mentioned above , we may rewrite the inequality 
(1.58) as 

N FI (1 ~ x n) 

办⑴ (I .叫 

nj^m 

However, 

pN[xm) = cos (AT cos -1 (cos ^)) = (一 1 广， （ 1.62) 

thus putting P^(xm) in place of (—l) m in (1.61)，we have 

n n (i-h) 

I^n( 1)|< E Piv(x m )-^ - ~~^ = Av ⑴， (1.63) 

m=0 n \ x m - Xn) 

njtm 

and that ends the proof of this theorem. | 


By Theorem 1.3, we may obtain 


，〜 ， 、、 / •， i 2 cos 入 一 cos a + 1 v 、 

片 ( 入 ） =cos (TV cos - ), a < A < 7r 

cos a + 1 


|^(A)| < 1, a < A < 7T, 

and |^(0) I has the maximum value. 


(1.64) 

(1.65) 



Accordingly, the OCFF is 
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I 孖 ‘（ A) = 5cos(Ncos--> 2cos c ^ 9 i a+1 )- « < 久 < 兀; 

I ^ = 1/c = 1/P^(l). 


( 1 . 66 ) 


As to the functional representation of the OCFF in the interval [0, q), (1.66) will 
no longer be true ， since for a > 0, 1 — cos a > 0 and 1 + cos a < 2, so that 

2_+(l-cosa) > ^ 

1 + cos a 

that means the argument 


/2 cos A — cos o ： + 1 、 
cos ( - ) 

COS a + 1 A=0 

is undefined. 

Fortunately, since the Chebyshev function H* (A) (see (1.50)) is a polynomial of 
cos A of order N, its function on [0, a) is also determined uniquely by (1.66), and 
so we can find another representation of Chebyshev function (see Ex.1.1.1 in Rivlin 
(1990)) 


TnM = -{( 2 / + \/y 2 - i) N + (y — \/y 2 - 1 —1 < y < -fi (1.67) 

for the P^(x), i.e. 


/ 2x — a -h 1 


2x 一 a + 1 


2x — a + 1 


( 1 . 68 ) 


where a = cos a, and 

戶 N ⑴ 


: 


3-a 


3-a 


2 \ N 


r 3-a 、 


(1.69) 


Now, put 6 = 1/P^(1), C\ = cos A, then we have the final result for OCFF 
()f S cos(N cos~ 1 0(A)), a < A < 7 t; 


多{剛十 vW) - 1 广 + (0(A) — vW)-l n 0 < A < a, 

(1.70) 
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where 

0(A) = —— ^ + 丄 -， a = cos a. (1.71) 

CL -j— 1 

Evidently, two branch values of in (1.70) at A = a are the same, since 

//*(A) is a continuous function on [0, n]. 

For the purpose of practical application, an unavoidable problem is: how many 
samples should we have for a given error S > 0? 

It is apparent that when the sample size increases, the distance of the observation 
points on the working area is also enlarged, i.e., the prospecting accuracy of the 
measurement is decreased too. 

Now, suppose that the error <5 > 0 in (1.70) is given, we may select an appropriate 
sample size N in the following way: 

At first, we know that in (1.69) 


3 — CL 
a + 1 







(1.72) 

(1.73) 


and the first factor in (1.72) is greater than 1, because a = cos CC< 1. Thus 


3 — CL 

fl + 1 



(1.74) 


holds. 

Therefore, in (1.69), when N is sufficiently large, we may have an approximation 
form _ 

_ ’ I N 

PnW 


3 — CL 


3 - a 、 


2\a+l V V 

For 6 > 0 given, S = we may put 

〆 r ~ 

1 /3 -a 


2 V a + 


3 - a ' 
、 a + 1 > 


N 


> 


to obtain the smallest even integer N. 

For example, let 6 = 10~ 6 , a = 2 丌 /38, then 

a = cos a = 0.9863613 ， log(2^ _1 ) = 14.5086, 

, 1 ~ 

3 — CL 


log(2<5 _1 ) / log 




: + 1 


1 


14.5086 

0.16554 


87.6440. 


(1.75) 


(1.76) 


(1.77) 


(1.78) 
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so that TV = 88 is available for the filtering, the total number of the sample size is 
2AT + 1 = 177. 

In our marine gravity meter, we have A = 0.4 (sec), a = cos Aa, and different 厶 ’ 
a and N are listed as in Table 1.1. 

Table 1.1 (A = 0.4, a = 2n/T) 


1 

T 

10 -5 

10 -6 

10 - 7 

30 

N:140 

166 

195 

33 

156 

186 

215 

36 

167 

198 

235 

38 

184 

219 

255 

42 

195 

232 

274 

45 

208 

248 

294 

48 

225 

268 

314 

60 

392 

120 

784 

180 

1176 

240 

1568 

300 

1960 

420 

2744 


4. The Frequency Rectification by Filtering 

There are many kinds of rectification for obtaining the true gravity values g t 、 
such as latitude rectification, tide rectification and other geophysical rectifications. 
One problem troubles the engineers, that is: in Fig. 1.1, the electronic equipment 
for measuring the vibration frequency of the metal wire can only offer the averag¬ 
ing value of frequency within a certain period of measuring time, but the instant 
frequency f(to) for measuring each gravity values is needed. They wanted to know 
“whether it is possible to obtain the instant frequency from the series of average 
frequencies”. As a matter of fact, suppose that the vibration frequency of the wire 
in period A is then the output of the measure equipment is the averaging 
frequency appeared in △, that is 


= N x f 0 

N Q n 


(1.79) 


where /o is the standard frequency of the crystal in the instrument, say /。 = 5Mc, 
Nq is the sharing times during the measure period, say Nq = 2 X 10 6 , n = 256. 
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Table 1.2 

Period= 30, Sampling interval: 0.4, Size= 88, 6 = 0.1 
Following values are Chebyshev functions: 


79.7296 

39.80195 

2.507649 

0.087963 

一 0.078568 

0.035531 

-0.004134 

-0.017156 

0.031807 

-0.042165 

0.049676 

-0.055247 

0.059445 

-0.062652 

0.065124 

-0.067035 

0.068516 

-0.069674 

0.070654 

-0.071916 

0.075413 

一 0.088263 

0.082962 

-0.019691 

0.062817 

—0.07005 

0.071232 

一 0.071207 

0.070894 

一 0.07048 

0.070001 

一 0.069474 

0.068898 

-0.068281 

0.067621 

-0.066925 

0.066192 

一 0.065426 

0.064624 

-0.063794 

0.062931 

-0.062042 

0.061122 

-0.060179 

0.059207 

-0.058302 

0.057283 

-0.056242 

0.055178 

-0.054091 

0.052981 

-0.051856 

0.050708 

-0.049542 

0.048356 

-0.047154 

0.045933 

-0.044698 

0.043441 

-0.042177 

0.040892 

-0.039617 

0.038436 

-0.037784 

0.039803 

一 0.054132 

0.098694 

0.043759 

0.014537 

-0.025562 

0.026654 

-0.025738 

0.024405 

-0.022991 

0.021552 

-0.020105 

0.018646 

-0.017184 

0.015714 

-0.014239 

0.012761 

-0.011281 

0.009793 

-0.008299 

0.006808 

-0.00531 

0.003817 

-0.00231 

0.000805 



Following values are coefficients of Chebyshev filter: 


0.092749 0.092623 

0.091122 0.090815 

0.087718 0.087130 


0.093067 

0.092073 

0.089387 

0.085154 

0.079404 

0.072735 

0.065308 

0.057235 

0.049184 

0.041094 

0.033261 

0.026293 

0.019933 

0.014594 

0.010386 

0.006871 

0.004277 

0.002428 


0.0928 

0.091625 

0.088711 

0.08434 

0.07841 

0.071366 

0.063697 

0.055424 

0.047244 

0.039409 

0.031823 

0.025109 

0.019078 

0.013756 

0.009598 

0.006235 

0.003734 

0.002087 


0.083005 

0.07702 

0.069911 

0.062275 

0.054081 

0.045725 

0.037833 

0.030252 

0.023535 

0.017859 


0.008934 

0.005837 

0.003396 

0.001789 


0.081957 

0.075636 

0.068213 

0.060368 

0.052332 

0.044134 

0.036448 

0.029152 

0.02241 

0.016774 


0.008038 

0.00523 

0.003041 

0.005804 


0.092151 

0.089978 

0.086157 

0.080822 

0.074438 

0.066962 

0.058841 

0.050662 

0.042375 

0.03463 

0.027659 

0.021222 

0.015847 

0.011296 

0.007475 

0.004716 

0.002693 


0.012861 0.0119 
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Theoretically, let f(t) be the instant frequency at time t, then we may assume 

_ i r (*+. 5 )^ 

fi = f(tx) = 7 - / /(0 dt, ti = i A . (1.80) 

A 

Now, start from the set {/»}，how to get the instant frequency We solve this 

problem in the following approximate way. 

Let t = t{ -\- r, and assume the frequency function f(t) is smooth enough, then 
we may have the Taylor series 

f(t) = /(<‘) + 厂 (t , 卜 + ••■ + + ° ⑸. (1-81) 

For practical calculation, we neglect the residual term in (1.81) and put into the 
following integral 


Smi 


r«. + T 




f(t) dt y 


then we have 


m = 土 1, 土 2, • 


(1.82) 




. in. 


f(ti + t) dr 

J _ m 

翁準⑽ V ..，， 2 卜、 


2! 3 V 2 


(2n)! 2n + 


t(?) 


2n+l 


(1.83) 


or equivalently 


Smi = /(“•）+ 


/ (2) M 


3! 




(2n + l)! V 2* 


(1.84) 


where m = 1, 3, 5, 7, • • •. 

Now, the problem is changed into: when {5 mi } is known, how to get an ap¬ 
propriate estimate for the instant frequency /(^). It is apparent that we consider 
/(<,), /( 2 )(ti), /( 4 )( 亡 ••)， /⑹⑺)， for fixed 尤 “ as four unknown parameters in (1.84), 
and solve the following system of linear equations 


Su = a 0 + 士(各 ) 2fl i 十 士(去 ) 4fl 2 + 士(去 ) 6fl 3 

Szi = a 。 + 各 (|) 2 ai + 占 (§) 4<1 2 + 7[ (j) 6a 3 

Ssi = ao + +(§) 2a i + ^!(§) 4fl2 + 7[(§) 6a 3 

^ 7 i = a 0 + j[(j) 2a l + 士 (S) 4a 2 + 士(善 ) 6(I 3 
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Thus we have 


where 


Su 1111 

S 3i 3 2 3 4 3 s 

S si 5 2 5 4 5 6 

. S 7i 7 2 7 4 7 6 

ao = fi= 丁 i 1 1 

1 3 2 3 4 3 6 

1 5 2 5 4 5 6 

1 7 2 7 4 7 6 

=1.1962891 Su - 0.23925781 S 3i + 0.04785156 5 5t - 0.00488281 S 7i 


Su = f(t{ + r)dr = f(t{) = fi 


- - 1 1 l 

s 3i = 3 / 3 /(«< + T)dr= - ^2 f i+] 

° y=-i 

S si = - / 5 /(^t + r) fi+j 

y=—2 

s 7i = z f{ti + r)dr= - ^2 /«'+>■ 


Put (1.87) into (1.86) we obtain formula 

f(^i) = + fx+\) + b 3 (fi^2 + /i+2) + &4(/i-3 + A+3) (1.88) 

where 

f 6 ! = 1.12540927, b 2 = -0.07087979, , 、 

< (1.89) 

i 63 = 0.00887280, b 4 = -0.00069751. v ’ 

It is very interesting to note that the formulas (1.88) and coefficients (1.89) show 
that it is possible to obtain the instant frequency by averaging frequencies by fil¬ 
tration with symmetric coefficients. 


5. Practical Checking in the Prospecting Field of the East Sea of China 

In mathematical applications, the most important thing is to check if the results 
of theory and methods in practice agree, for otherwise, it will lose its real meaning 
of “application 7 * • 
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The OCFF // ( 入 ） designed with Chebyshev function were compared with sev¬ 
eral other CFFs which we obtained in 1970-1975. One is called “triple averaging” 
method, i.e. the output series of a moving average will be refiltered twice with the 
same filtering coefficients. Another filter is designed as a Gaussian pattern 

h k = ce - 0 0004 (* A ) ， l k = 0,±1,±2, ••- . (1.90) 

Under the same accuracy of measurement, some results of these CFFs are shown 
in Table 1.3 for comparison. 


Table 1.3 


Sampling interval 

A (sec) 

Sample size 

N 

Real distance between 
two prospecting points 

Triple average = 

0.6 

1024 

3.15 

KM 

Gauss = 

0.4 

1024 

2.00 

KM 

Chebyshev = 

0.4 

512 

1.00 

KM 


In 1970’s, we used comparatively small computer, its computing speed and the 
volume of memory were on .a low level, and that was a limitation for reducing the 
long term period of interference in our measuring system. For further improvement 
of the performance of filtering the output series of the Chebyshev filter had been 
used for the second filtering and its coefficients were designed as 

h k = exp{—0.3328 x 10 _ 5 (A:T) 2 }, k = 0 , 土 1 ，土 2 ， .±3, (1.91) 

i.e. 

h 。 = 1, hi = /i_i = 0.869, /12 = h 一 2 = 0.571, /13 = h -3 = 0.283. (1.92) 

Joining this filter with Chebyshev filter mentioned before will approximately equal 
to a filter with the truncated frequency at a = 27r/420. 

The whole system (as shown in Fig. I.l) called “ZY -1 dynamic marine gravity 
meter” was installed on a boat and very severe test had been carried out during 
1970-1975 by prospecting engineers. During these five years, the strongest wind 
happened in the navigation was the 6 th grade and the average acceleration of the 
“noise” was 50,000 times stronger than that of the signal. Fortunately, the S.D. 
error of our measuring system is about 1 — 2 mgal only. 

For example, one output record of the prospecting on the East Sea of China in 
Oct. 1974 and the real result which obtained by static state Canadian gravity meter 
is shown in Fig. 1.5. 
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CASE II 

Digital Filters Design by Maximum Entropy Modelling 


1. Problem Statement 

As we know, many traditional filter designs are based on the functional approx¬ 
imations, and some researches in recent years have introduced the filter designs 
by time series modelling approach, i.e. one based on such theory and methods of 
stationary time series, as model fitting and spectral estimation theory which have 
been introduced in Chapter 3. Such methods are not only simple and convenient 
for designing but also have some special advantages. It is not necessary to have an¬ 
alytic formulation for the filter’s response function; instead, only data of amplitude 
response function (ARF) are required. Moreover, the filter so obtained may possess 
the property of “minimum phase delay” ， etc.. 

Now, an electronic equipment needs a digital filter, its main specification require¬ 
ments of the amplitude response function of the filter are as follows: 

(1) . The central frequency is / = 19 (KH). 

( 2 ) . The bandwidth of a 3 dB decay is B = 1.0 ^ 1.6 (KH). 

(3) . The decay of the ARF outside the area of [15.2,23.75] is not less then 20 dB. 

The figure of the ARF is shown as in Fig. II. 1. 

Now, we want to obtain coefficients of such filter in time domain by experimental 
data {|/f(Afc)|, k = 1,2,... , N} of ARF. Sometimes it will also be required that the 
filter should be designed as in “causal” version, i.e. to be physically realizable. 

The mathematical statement of the digital filter design may be briefly illustrated 
as in the following: 

Suppose that an ARF |//*(A)|, —n $ 入 g 7 r of a digital filter is given, and we 
want to design coefficients {/i^, /c = 0 , 1 , 2 ,...} of the filter in time domain, such 
that 


⑷. < oo 

L. 

(the convergent property). 

(ii.i) 

K 

(b). y^h k z k ^ 0, 

\z\ < 1 (the minimum phase delay). 

(II.2) 


k 
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Fig. II. 1 The ARF of a digital filter 


(c). [he 一糾 =|ir ( 入 )|， (II.3) 

k 

A filter satisfies condition (a)-(c) above will be called an “optimum filter” (C.F.). 
Now, a mathematical problem arises in connection with such a filter design, that 
is: for any given function, even the ARF is continuous on | — 丌， 7rj, does such an C.F. 
always exist? 

By the condition (11.1)，we know that the function 


H[e~ ix ) = h k e~ ikx , -ir<\<Tr (II.4) 

/c=0 

is well defined and belongs to L 2 (d 入 ) • 

Put 

h(^) = ^-.f Ifl < i, (n.s) 

2?ri = 1 f 一 （ 


then it is a Cauch integral and may be represented as a Taylor series 
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厂 l《l = 1 、 n=0 $ 




2w ^i =1 e n+1 




where 




丄 / H{e~ iX )e inX dX, n>0 
2tt J q 


~^n y fl 2 O. 


That shows 


H(i) = £ h ^ n 


is an analytic function in | < 1. In analytic functions theory, 丑 (C) belongs to class 
H 2 、(see (Privalov (1950))，and possesses 


lim H{pe~ iX ) = V h n t~ inX = 丑 (e_ lA ), 

— »i — 


Z^i^i 2 


|HdA 


|F*(A)| a dA < + 00 . 


(n.10) 


According to the condition (II.2), which means 

//( f ) #0 ， | f | < 1. 


( 1111 ) 


B(z) = logff(z), \z\ < 1, 


(H.12) 



138 


which is analytic in the unit disc with its real part 

Re B(z) = log \H(z)\ 

a harmonic function in | 2 | < 1. 

Thus, we have 


(11.13) 


log I 及 ⑼ I 


Since for any a > 0, 


holds, so we have 


2tt 


loglflV—ldA ， (0 < p < 1) 


I log a| = 2 log + a — log a 


(11.14) 

(11.15) 


2 n 


\\og\H(pe-^)\\dX 


r27T 1 

iX 


2 n 


2 n 


2log + I 开 (pe“ A |dA — 


2tt 


log|/f(pe-' A )|rfA 


2log+ 叫 e - iA )l<fA-log 畔 )| (by (11.14)) 


(by, ° 6+ ^ a) - 


(11.16) 


(11.17) 


By the well-known Fatou lemma (see Loeve (1963)), for H(X) defined in (11.9 )， 
we obtain 

占 [ 2% \\og\H(e- ix )\\dX< lim 广 | log | 丑 《，.”|| dA 

«/。 p— ^ Jo 


-A 1 ?- 


2tt 


\H(pe~ ix )\ 2 dX 


log (0)| 


r 2 tt 


-log 剛 1+6 人 \H( e - iX )\ 2 d\ 

- log|//^0)| + E|/ij^ 2 <+oo 


(11.18) 


which shows that if we require the condition (II.1) for our filter, then 

\\og\H{e- ix )\\dX = [ 2K \\og\H-(X)\\d\ < +oo 
Jo Jo 


(11.19) 
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is necessary. 

Secondly, when conditions (II.2), (II.3) are satisfied further, then if ⑷ in (II.5) 
differs, at most by a constant e ia ，which is uniquely determined by 

ff(f) =exp{^ I log|//(e- A )[^_ a j-^a| 

= exp{^/Ho g |//*(A)|i^±l,A}, 

(see Privalov (1950)), and 

oo 

lim H{pe- ix ) = H(e~ ix ) = 'yhke-^. (11.21) 

V 

Evidently, since we have only boundary value |/T*(A)| of H(^) in (11.20), so the 
obtaining of {hk, k > 0}, i.e. the Taylor coefficients of is not an easy job. 


M < l 

( 11 . 20 ) 


2. Design the Filter by Maximum Entropy Modelling 

So far, we have proved that under some regular conditions for a given ARF, then 
the C.F. is obtainable but much complicated. Another approach for filter design is 
to seek an approximate solution of the problem, i.e. we may change the condition 
(II.3) into: finding of coefficients {hk^k > 0} of the filter such that 

oo 

J^h ke _ ikx - |J/*(A)| (11.22) 

/c = 0 

is “as close to | 丑 ♦( 入 )| as possible” under certain criterion. 

Now, suppose that |/P(A)| satisfies the condition of (11.19). On putting 

/(A)=^|/f*(A)| 2 , 0 < A < 2tt, 

and writing 

/• 2tt 

R{k) = / f[X)e ikx dX, k = 0,±1,±2 ,... (11.23) 

Jo 

{R(k)} may be considered as a covariance function of k. Since now {R(k)} are 
non-negative definite series, i.e. 


for any M > 0, 
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V -^(p-1) 丑 (p —2) ••• -^(0) J 

positive definite. Then we know that under the criterion of O.P.E. or M.E. (see 
Chapter 3) we may find an optimum estimation for the spectral density 

/(A )= 砮 E 2 - 0<X<2^, (11.25) 

k=0 

where the parameters {^o, 彡 1 ，彡 2 ，... , <t> P } are the solution of the following Yule- 
Walker equation (see Theorem 2.9 and 2.10 in Chapter 2) 

R p+1 (l ， 6, … ， 6) r = ( 的， 0,… ， 0)' (11.26) 

On putting 

H{z)^ - - k c k z\ H < 1, (11.27) 

Z^k=o k=Q 

then the coefficients {c 矢 ， /c = 0,1,... } may be obtained by the following recursive 
algorithm 


c 0 .: 

=00 



Cl = 

= 一彡 l c 0 



Ck : 

=— 4>\Ck-l — <t>2Ck-2 — • 

'• • — <t>kCQ 

(11.28) 


[<t>k = 0, k > p) 

(see Theorem 2.5 in Chapter 2). 

Then 

oo 

(a)- ^|c*| < +oo ,SO 


^|c fc | 2 < +oo. 

(11.29) 



(b). #0. Ul < 1. 

o 

(11.30) 

(see Corollary of Theorem 2.9). 



we suppose again that R(k), 0 < k < p are real and known, which keep the matrix 
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(c). Since /(A) = ^ ： |if(e _tA )| 2 is a good approximation of /(A)= 古 | 丑 *( 入 )| 2 , 
then 

|ff(e-*' A )|~| J fT(A)| (H.31) 

is an acceptable function. 

However, there are only finite number of terms of {ck,k = 0,1,... , iV} available 
in practical application for filter design, so one of the trouble problem is on the 
selection of the number N for obtaining satisfactory approximation on account of 
(11.31). 

The following theorem gives a rough upper bound of the number N: 

Theorem II.1. Suppose that (U 2、 •.. ， <t>p) are solution of Yule-Walker equation 
(11.26). By denoting the polynomial 

p 

= (0o = 1 ) 

0 

p 

=n ( 2 - 2s )， w > 1 ， 3=1,2,...,p, (11.32) 


we select a suitable positive number e > 0, such that 

p = ^min \z 3 \ — e > l, (11.33) 

then 

|c*| < e 0 Mp~ k , fc = 0 , 1 , 2 ,... , (11.34) 

where M -1 = inf| z | =p |$(z)|. 

Proof. By the Corollary of Theorem 2.9, we know that all of roots [z a ^s = 
1,2,... ,p) of $(z) satisfy 

\z a \ > 1, 5 = 1,2, ••- ,p. (11.35) 

Therefore there exists a positive number e > 0 which keeps (11.33) true. 

Then H[z) = ( 2 ) is an analytic function in \z\ < p and \H[z)\ ^ 0, \z\ < p. 

Therefore, the Taylor coefficients {ck} of H(z) can be represented by a Cauchy 
integral 

= ( IL36 ) 
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Put ( = pe— l \ 


-ie -、 6 p dQ 、 then 

I 十， ⑼ 1 


A:! 


< 


< 


2np k 


\T(pe~ id )\de 


OoP~ 


k 个二刚 1 

c 0 Mp~ k 


and that ends the proof. 


(11.37) 


It is not difficult by numerical analysis to seek a suitable value p which satisfies 
(11.33). We only need to start from p = 1 to a suitably large number R 、and find 
the smallest root \z a \. When f = pe x9t = z 9l then ^[pe xd, ) = 0, (see Fig. II.2). 



Fig. II.2 Graphical illustration of finding a positive 
number p > 1, |<E>(z)| ^ 0, \z\ < p 

For example, let 

$(z) = 1 一 1.5z + 2 2 — 0.25z 3 

=——[z — 2) [z 一 1 一 t) (z — 1 + t) (11.38) 



Fig. II.3 Roots allocation of ^(z) 
numerical calculation for seeking an appropriate value p is given 


Table II.l 
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Accordingly, we may take p = 1.3, and 

M 1 


㉟ .3 刚 1 


12.89 


then select D > 0, such that 


where cq = 1. Thus we have 


\c N \<Mp~ N < D, 

log^ 


N > 


log〆 


(11.40) 


(11.41) 


(11.42) 


Of course, this is a very rough upper bound, for example, by putting D = 0.293, 
p — 1.3, M — 12.89, then N > 14.43, so we may select TV = 14 or TV = 15 for a 
preliminary value, but actually, when k > 14, we have |c^| < 0.02. 


3. A Practical Filter Design 

As we have shown an electrical equipment design needs a digital filter, and the 
required principal specifications are listed in (l), (2) (3) in the first Section of the 
present case. 

In this practical design, 64 points of \Hk\ = (Ajt)|, k = 0, 1, 2, ••• ， 63, can be 

offered, then 

(1) Put 

fk = fc =0,1,2,••• ,63 (11.43) 

as the spectral density function. 

(2) The covariance function {i?(0) ， i?(l), 72(2) ， … ,i2(p)} may be obtained by 
FFT. 

(3) Solve the Yule-Walker equation (11.26) by the Levinson algorithm (see Theo¬ 
rem 2.7) and obtain the parameters of the model {^o, 彡 1 ，彡 2 ,… , 

(4) Coefficients of the filter {cjt, A; = 0,1,2, • • • , N} may be derived by the iterative 
algorithm (11.28). 

In our case we took N = 19, p = 19, the covariance functions and the parameters 
of the model are listed in Table II.2 and Table II.3 respectively, and 

= (^</>kR(k) 

\jt=o 


2.634. 


(11.44) 
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Table II.2 


k 

m 

k 

m 

0 

23.22 

10 

1.18 

1 

3.93 

11 

-2.67 

2 

-16.99 

12 

-1.56 

3 

•8.24 

13 

1.40 

4 

7.83 

14 

IA3 

5 

7.84 

15 

-0.56 

6 

•2.77 

16 

•1.07 

7 

-6.16 

17 

0.054 

8 

0.058 

18 

0.615 

9 

4.32 

19 

0.126 


Table II.3 


k 

<t>k 

k 

<t>k 

0 

1 

10 

0.262 

1 

•0.256 

11 

0.057 

2 

1.186 

12 

0.183 

3 

-0.046 

13 

0.048 

4 

0.746 

14 

0.125 

5 

0.028 

15 

0.044 

6 

0.524 

16 

0.075 

7 

0.068 

17 

0.034 

8 

0.361 

18 

0.032 

9 

0.059 

19 

0.018 


The Wold coefficients {cjt, k = 0,1,2, ••- ,20}, i.e. the optimum filter are listed 
in Table II.4, where {cjt, A: = 0,1,2, • • • , 20} has been normalized such that 

mzLX I ^ Ck exp(—:A)| 2 = 1. (11.45) 

k 

Fig. II.4 is the ARF of the designed filter, which shows that two ARFs are almost 
coincide except a slight fluctuation. Besides , all of the specifications listed in (1)-(3) 
are satisfied. 

If we realize the designed filter as FIR, we use the operation 

p 

yt = 0 o x t - ^2 ^kyt-k- (11.46) 












CASE III 


The Spectral Analysis of the Visual Evoked Potentials of 
Normal and Congenital Dull Children .(Down’s Disease) 


1. Introduction 

Scientists have studied the topic of statistical dependence between the Visual 
Evoked Potential (VEP) and the Intelligence Impediment (II) for many years (see 
Chalke (1965), Rhodes (1969), Straumanis (1973), Xu (1979)). Unfortunately, the 
analysis in the published papers uses comparatively simple mathematical methods, 
i.e. uses the canonical time domain wave form analysis, such as measuring the in¬ 
cubation period, comparison of the difference of amplitudes, etc. Therefore, the 
conclusions on the dependence of VEP and II are often different by different meth¬ 
ods. Chalke (1965) reported that the incubation period of II is longer than that 
of the normal population, but Straumanis (1973) had a different conclusion, he re¬ 
ported that there was no significant difference between them, and the observation 
of Rhodes (1969) showed that the amplitudes of the VEP of II children, in general 
case, are greater than that of normal children. 

In China, physiologists are interested in such topic too. Liu and his research 
group also had different results from the report of the former Soviet researchers 
Novikova and Chislina (see Liu et al. (1963)). The later researchers reported that 
the a-wave in EEG record of dull children was not a notable component, but Mr. 
Liu et al. investigated the EEG for 106 dull children in 1963 and discovered that in 
their records a-wave and 0-wave components appeared almost alternatively, even 
though 0-wave emerged more frequently. 

Xu (1979) studied the VEP and Auditory Evoked Potential (AEP) of 30 normal 
children within the ages of 9-16 years and 14 congenital children in the same age 
group. Statistical analysis used in the paper of Xu for the VEP and AEP can be 
introduced briefly as in the following: he measured both incubation periods and 
amplitudes 7V t , P{ (z = 1,2, 3) of VEP and AEP respectively, then the t-test was 
carried out for those data. Their results for two populations are listed as in Table 
III.l, where “group D” denotes the dull children and “ group N” the normal children. 
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Table III.l 



Incubation periods (m sec) 

Amplitude (m volts) 

Components 

Ni 

尸 2 

"2 

Pz 

N 1 -P 2 

Pi-N 2 

^3 

Mean 

VEP 

Group D 
Group N 
t-test 

86.07 

72.5 

p < .01 

141.7 

137.8 

p > .05 

238.9 

213.9 

p < .05 

•336.6 

351.6 
p > .05 

10.88 

13.3 
p > .05 

13.55 

14.1 
p > .05 

4.95 

2.46 
p < .05 

Mean 

AEP 

Group D 
Group N 
i-test 

134.28 

126.7 
p > .05 

192.8 

189.2 
p > .05 

273.9 

253.5 
p > .05 

418.4 

365.0 
p < .05 

24.69 

20.6 
p > .05 

33.52 

21.99 
p < .05 

9.86 

2.76 

p < .01 


According to the testing results in Table III.l of Xu, it is apparent that among 
the <-tests in VEP, only 3/7 of them is statistically significant (A^ ， N 2 and P 3 ), 
and the same percentage also appears in the t-tests for AEP (P 3 , P 2 — N 】and P 3 
of amplitude), which shows that the simple analysis of the wave forms could not be 
taken as a statisfactory conclusion on the discrimination of two populations. 


2. Spectral Analysis of VEP Records for Dull and Normal Children 


Investigation on such interesting topic by advanced statistical methods of stochas¬ 
tic processes was suggested by some physiologists in China and they thought that 
different characteristic information of dull and normal population might remain 
hidden in VEP records. 

Three VEP records of a normal girl at different times are shown in Fig. III.l, 
which show that the VEP of a person is comparatively a stable curve. VEP curves 
of a dull girl and a normal boy have been recorded respectively as in Fig. III.2, 
which seem that the curves of two populations are likely to have different frequency 
components in the frequency domain. 

For digital spectral analysis, the first step is to digitize the continuous VEP record 
into digital series, the sampling interval A is determined by the Shannon’s theorem 
(see Theorem 1.10, Chapter l) 

A = 


(iiu) 


where W is the upper bound frequency of VEP records. According to many studies, 
physiologists suggested a reliable value on W as an upper bound 


W = 50 Hz, 


(m. 2 ) 
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Feb- 27 1970 



Fig. III.l The VEP records of a normal girl at different 
times 

then we may select 

A = ^7 =0.01 (Sec) (HI.3) 

and digitize a record into a series 

^ = x(/cA), k = l 、 2 、 … 、 N 、 …、 (III.4) 

where a : ⑴ is the VEP record. In order to use the computer technique in practical 
application, all of the records should be digitized into discrete data and selection of 
a suitable upper bound W \s very important thing. Evidently, it would increase 
the computing time if we select the W to be too large a number since {《* = x(kA)} 
contains unnecessary sample points. On the other hand, if we select the W too 
small, i.e. it would not be an upper bound of the signal’s frequencies, then the 
u aliasing phenomenon” might occur. 

In fact, suppose that x(i), —oo < t < oo is a stationary process. Let △ > 0, and 
put = x(A:A), k = 0 ,土 1, 土 2,..., then {^} is a stationary series, its spectral 
density / 《（ 入 ） exists 

’“ 入)=去 S /x(^^), Aen = [0,2^] (III.5) 

(this result is Theorem III.l in Appendix III). (III.5) shows that having obtained 
the spectrum /(A) of the sampling series {“}，which cannot, in general, offer the 
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old 


Fig. III.2 The VEP records of a normal and a dull 
children 


precise information on the spectrum of the original process a ： ⑴， since, for any 
A € IT, / 《（ 入 ） is composed of 


/x 


'A + 2A ： 7r 、 
^ A ~~ 


k = 0, 土 1 ，士 2，.. 


(III.6) 


and for different sample intervals, △，we may obtain different spectrum / 《（ 入 ）. 
When 2 : ⑴ satisfies the condition 


/x(A) = 0, |A|> 2ttW 

and select A = l/(2VV), then we restore the original spectrum of x(t) 


/x(A) 




2tv A(w)» |A| < 27TW, 
|A| > 2nW, 


(III.7) 


(III.8) 


(see Theorem III.2 in Appendix III). 

Now, we consider the main part of a VEP record for each person as a realization 
of a stationary stochastic process and suppose also that the spectral density of {^k} 
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exists and satisfies some mathematical conditions. Then, according to the window 
spectral estimation (see (3.206), Chapter 3) we have 

fk = ^ k = 0’l ， 2 ， .“ ， m；v (III 9) 


where is the window coefficients in time domain, and 

'y ( 灸） =^k~l 

< Wpk 

x N-k 

7(^) = ^： y(^ + 5)y(A:), k = 

、 3 — 1 

where m^v may be selected as Ky/N^ X = 1.0 〜 3.0. 

In the present case, w^(k) is obtained by the Bartlett kernel 


扣 nW = 



m N 

0, 


|^| < 爪 N, 

otherwise. 


(III.10) 


(III.ll) 


In (III.9), A = = 0 01( sec )，# = 50 ， m" = 15 ， i.e. {A, k = 0,1 ， 2, …， 15} is 

distributed on [0,50] (Hz), 

A= 2A^ = Ii = 3 5 (HZ) ' (IIL12) 

Since the observations for the children are carried out at different times, the 
output amplitudes of the records cannot be adjusted to the same level. In order 
to make comparison, it is necessary to normalize all the estimated spectrum to a 
standard value. We may put 


7 - h 

J k \r^rn N f 

3=0 ^ 9 


A: = 0,1,2 ,ttin 


(III.13) 


in place of 八 ， such procedure is equivalent to assume that all of the VEP records 
possess the same average power spectrum. 

Indeed, we know that the variance of a process may be understood as its average 
power and now we may approximately consider the integral of the spectrum to be 
a summation, i.e. 

Var ⑹ = 厂 f ( (X) A (入 *)△ (III.14) 

J ~ n k 
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then 


and 


△A( 入 fc) _ = j 

~ E a A(MA ~ Jk 


(III.15) 


fk a when V ar(( t ) = const. 


(III.16) 


Now, for the dull and normal populations, the main average estimated spectra 
are listed in Table III.2 and the figure is shown in Fig. III.3. 


Table III.2 


Est. Spec. 

7。 

7, 

7, 

7 S 

1* 

7 S 

Norm. Popul. 

0.181 

0.259 

0.273 

0.148 

0.053 

0.0219 

Dull Popul. 

0.221 

0.322 

0.256 

0.083 

0.039 

0.017 





Fig. III.3 Estimated spectra for dull and normal popu¬ 
lations 
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3. Statistical Analysis for the Detection of Characteristics 

In order to find the characteristics of two populations mentioned above, it is 
necessary to make further statistical analysis for the spectral estimates {fk}- 

Now, a theorem on the asymptotic distribution of spectral estimates is needed 
for the analysis in the sequel (see David R. Brillinger (1981)): 

Theorem III.3 (Asymptotic Distribution of Spectral estimates). Suppose 
that x[t) is a real Gaussian stationary series, its covariance function i?(/c) satisfies 
the condition of 

Y^(i + mR(k)\<+^, (in.17) 

k 

then the windowing spectral estimates (/i, A,...,//)> < m N) by (III.9) are 

asymptotically independent random variables with the asymptotically normal dis¬ 
tribution 

A 〜 N 〈 f k ， 2n 鲁 £ ^(A)dAV (IIL18) 

where ^ 

f dA = (III.19) 

•/ — JT » 

Accordingly, we make the following two statistical analysis: 

(1). 亡 -test for each frequency of N, D populations. 

Based on Theorem III.3 for each pair (/ 上 7 /j [ D ))，k = 0 y 1,2, …， m, we may have 
the following hypothesis testing 

H 0 -Efl D) = E{ ( k N) (k fixed, and variance unknown). 

We may assume that the populations are normal distributions with the same 
variance，so that the testing statistic may be selected as 

t = 


^ - ^ 


\ /忐 + 忐 


(III.20) 



(III.21) 
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respectively are the means of and {/^^}, and 5^ the sample variances. 

The statistics t of (III.20) follows the i ni+ri2 _2 distribution. 

For /c = 0,1,10, we found that for A; = 1,3, the hypothesis was rejected by the 
test under the significant level a = 0.01. That means two populations (N and D) 
are significantly different in the frequencies 

f l = 3.33 (Hz), f 3 = 10 (Hz). (III.22) 

We know that the former frequency f\ = 3.33 Hz in physiology belongs to the 
厶 -wave and the later one = 10 Hz to the a-wave.* 


(2) Discriminant analysis. 

We may further carry out the discriminant analysis on pairs of {(/i，, 3 )} as the 
characteristics of two populations, i.e. to put 


N = (/i ( > ) ./3> ) ) r . * = 1 ， 2,… ， n 2 , 

D = (/ 1 ( ,y ) ,/l,y ) ) T , 


(III.23) 


as vector samples obtained from two populations, then the linear discriminant anal¬ 
ysis can be carried out as in the following (see Karson (1982)): 

The sample means are 


(D = (0.3217, 0.0832) t , 
\ N = (0.2488,0.1603) r , 


(III.24) 


the sample inverse covariance matrix is 


_ / 3.60634 3.98466、 3 

3.98466 4.96094 ) ， 

and the coefficients are 

r C N = S~ l N = (1536.00,1786.62) T , 
C D = S - 1 D = (1491.68,1694.62) T , 

I C 0)N = -^N T C n = -334.28, 

[. ^o,D = _ Cd = - 310.43. 

So, the linear discriminant functions are 

f L N (f\,f 3 ) = C 0iN + + C^/a, 

I L D {f x ,h) = C 0 ,d + C { D l) f x + C^f 3 . 


(III.25) 


(III.26) 


(III.27) 


•In neuronphysiology, the frequency in 0 〜 4 Hz is called the 6-wave, 4 〜 8 Hz the ^-wave, 
8 〜 14 Hz the or-wave and 14 〜 31 Hz the 々 -wave. 
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For any other VEP record, we may obtain the sample F = (/h/ 3 )，by the 
procedure mentioned above and substitute it into (111.27)，then the conclusion is 


or equivalently, 


FeN, 

if iN(/i,/ 3 ) > L D (f\,f 3 ), 

(III.28) 

F eD, 

otherwise, 

F e N 、 

if 44.32/i + 92/3 > 23.85, 

(III.29) 

FeD, 

otherwise. 


The discriminant analysis figure for our samples is shown as in Fig. III.4. 



Fig. III.4 Discriminant analysis of (/i，/s) for popula¬ 
tions of N and D. 


The efficiency of such analysis usually can be checked by Hotelling’s T 2 -test. 

For Hq ： Mi = M 2 unknown), the testing statistic is 

T 2 = (x x - x 2 ) r S[ 2 1 (x 1 - i 2 ), (III.30) 

十 ri2 

and follows T 2 (ni 4- n 2 — 2) distribution, where 

{ ^y = (无 ljy，". > ^mg)^' ■> 9 ~ 1 ， 2; 

= 

S12 = ) mx mi 

Si2 f t; — ni +i 2 -2 ^Z'k'Li^igk — ^ig)i x jgk — ^jg)- 


n fl 


Ylk=l 


1 , 2 ,- 


^ = 1 , 2 ; 


'TTT 、 
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Table III.3 


N-population 

D-population 

0.0233 

0.0606 

0.0239 

0.0597 

0.O26O 

0.0577 

0.0307 

0.0561 

0.0410 

0.0570 

0.0660 

0.0620 

0.1057 

0.0665 

0.1313 

0.0621 

0.0740 

0.0470 

0.0410 

0.0380 


We know that the relationship between F-distribution and T 2 -distribution is 

尸 =- n - + : 2 ~~ m ~ l T 2 〜 F m (ni + n 2 - m - 1 )， （III.32) 

^1 I H-2 一 2 

(see Karson (1982)), so the test will reject Ho when F > F a , where F a is the critical 
value under the level a. 

In our case, m = 2, and Fnd = 13.09. For a = 0.01 ,then 

■F 2 (26) = 5.53 < F nd = 13.09, (III.33) 

which shows that the discriminant analysis carried out above is having higher effi¬ 
ciency. 

An interesting thing- is, Ye (1985) studied the same topic by M.E. method and 
obtained the same conclusion in her analysis. The processing steps of Ye may be 
introduced briefly* as in the following: 

Step 1 . Making a difference for the discrete data with order 1 , i.e. putting 

Xi = 6+1 - i = 1,2, ••- ,n ~ 1, (III.34) 

where {$*} is the sample series (III.4). 

Step 2. Fitting an AR(p) model by M.E. modelling (see Chapter 3) with order 
selection by AIC and make a division for the interval [0, 丌 j for T = 39 segments. 
Then the spectral estimates are 

fk = — p --- 2 , <= = 0,1,2,... ,r. (III.35) 

27T <f> 3 exp{—iskTr/T} 

3=0 

For the Dull and Normal populations, the estimated results are listed in Table 
III.3 and the figure is shown as in Fig. III.5 


S/O/1/2-/S/4/5/0/7/8/O 

t. 

s 

E 
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Fig. III.5 Spectral estimates for N, D populations by 
M.E. method. 

Ho is rejected by t-test (III.20) at A: = 0,1,2,3,4 and k = 6,7,8, which shows 
that: 

(1) There are statistical difference in N, D populations particularly in the low 
frequency field {/ < f 4 = 5.13 (Hz)}, i.e. in the field of <5, 0-waves. 

(2) In the field {/e = 7.69 (Hz) 〜 /s = 10.25 (Hz)}, i.e. in the field of a-wave, 
two populations have the greatest difference. 

No doubt, these results strengthened the former results by windowing spectral 
analysis. 

4. Physiological Interpretation 

One of the important things for applied mathematicians is that their research 
results should be in accord with the law of the field where the topic belongs, and is 
generally acceptable by the experts. 

Physiologists agree to the results revealed by the time series analysis mentioned 
above, since they have the same feeling intuitively but do not have successful quan¬ 
titative analysis before. Furthermore they gave very good interpretation on our 
conclusions, that is: the brain wave components for human beings from baby to 
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adult are developed gradually from low frequencies in frequency domain. In phys¬ 
iology, a-wave component is an important rhythm wave for the mature growth of 
normal children. Hence the high frequencies for the normal children are stronger 
than that of the dull children and the contrary cases in low frequency domain illus¬ 
trates that the high frequency components in the brain wave of dull children could 
not physically be well developed and their intelligence more or less stays in a low 
level. 

Finally, our research supports the report of Liu and his research group on the 
conclusion that a-wave and 沒 -wave all occur in the brain waves of dull children, and 
only the 设 -wave appears prominently than the a-wave. 



Appendix III 
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Sampling Interval and Aliasing 


Theorem III.l. Suppose that x(t) is a stationary process, and its covariance func¬ 
tion R(u) satisfies 


|i?(u)| du < +oo 


Putting 


rjk = x(kA), A: = 0, 士 1 ，士 2,.. • ， for ^ > 0. 
then the spectral density function of is 


八 ( 入 ) =z [八 


'入 + 2Jbr 、 


(111.36) 

(111.37) 

(111.38) 


where / r (A) is the spectral density of x(t). 


Proof. Since x[t) is a stationary series it can be represented as a stochastic 
integral (see Chapter 1) . 


工⑷ 


e lXt dZ x {\), teR x 


and the stationarity of is evident from the following equalities: 

Erik = Ex(kA) = const. VA: = 0, 土 1 ，士 2 ,…， 
by (III.39) we have 

ETJk + mVlc = ( ? 7/c+m> Vk) 

=([°° e^ k+m ^ XA dZ x {\), e ikXA ^(A)) 

\J — oo J —oo J H 

广 oo 

= / being independent of k 


(111.39) 

(111.40) 

(111.41) 

(111.42) 


e imXA f z (X) dX, 


(III.43) 


where, the inner product of (III.42) is in the Hilbert space H (see Chapter 1). 
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Now, the time series {rjk} is stationary, so its covariance function can be repre- 
sented as n 

r e ikT f n (r) dr (III.44) 




and the integral of (III.43) can also be rewritten into the form of (III.44). Indeed, 
putting A = 入 4 and rewriting Ri = (J fc [(2fc — l)w, (2fc 4 - 1)7T), we have 




■xXA 


fxWdX 


e lmA f x 


dA 




， 2(A:+l)tr ! / A \ 

e im{A - 2kw) \f z (^) dA 

知 J 2{k — \)tr 








A 


E 八 


.+ 2kn 、 
~~A ~~ 


dr 


(III.45) 


According to the unique representation of the covariance function (see Chapter 
1) and upon comparison of (III.45) with (111.44)，we obtain the equality 




+ 2kn 、 


a.e. [dr]. 


(III.46) 


That ends the proof of the theorem. 


Theorem III.2. Under the conditions of Theorem III.1, if the spectral density 
function of x(t) satisfies 


/ X (A) =0, for |A| > 2%W, 


and put 


Vn = x{nA), A =—, 

then / Z (A) can be represented in terms of /^(A), namely, 


八 ( 入 ) = ! wA ( 士） ， 

1 [ 0, otherwise. 


(III.47) 


(III.48) 



Proof. In Theorem III.l, we have the representation 
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r ,,'、 r / A 2n\ t ( \ 2tt\ 

/x ( - ] + /x f - + J + /z f ^ J + 


and for 


入 2丌 丌 2 丌 / 7T TX . 

- n < -- —n < —- = — 27 rVv. 

A A A A ~ A 


A 2n 
A + > 


丌 2 丌 7T 

Z + - Z = 2,rW> 




then 


/x 


'A 2n 

- r 

A 


Hence, for rj x ，we have 


Putting t = X/Aj then 


/,(>) 


r (\ 2 tt 、 

fx {A + A\ 




for n > 1. 


|A| < 7T. 


fni TA ) = J 八⑺， 


I ! ^ 

M < Z， 


八 ㈦ = ‘ 八（ ‘) ，叫 < 貨 . ■ 



CASE IV 


Statistical Analysis of VEP and AI 
by the Principal Component Analysis 
of Time Series in Frequency Domain 


1. Introduction 

In the study of Case III， we have got successful results from analyzing the VEP 
vs. the II by the spectral analysis. Another interesting problem arises: can we 
use the time series method to analyze some higher intelligence behaviour by VEP? 
For example, researchers in aeronautic medicine area want to know if it is possible 
to obtain some information on some physiological characteristics about aviation 
intelligence (AI) by VEP for students in the aviation training center. 

The first thing that we would do in this study is naturally to try the method 
of spectral analysis as in Case III. A discrimination analysis in two populations is 
shown in Fig. IV.1, where “population E” and “population P” are respective groups 
of students who will and who will not come out as qualified aviators upon completion 
of their training. The judgement whether a student belongs to population E or P 
is based on a series of tests under the supervision of experienced teachers. 

In Case III, we have seen from Fig. III.4 that the discrimination analysis in the 
VEPs of dull and normal children by linear function clearly separates the sample 
space into two regions. But Fig.IV.1 here fails to reveal a demarcation among the 
sample students and the “crosses” and “dots” are seen fairly intimately mixed up 
together. Of course, it is not surprising at all to have such an outcome, since in Case 
III, the main object is to distinguish the normal and the Down’s diseased children, 
and a simple discriminant function is sufficient for the work, while for students 
in the aviation center not only are some physiological characteristics needed to be 
studied, but also good and quick brains are required. As everybody knows, since the 
very first day a student enters such an education center, he is undergoing different 
kinds of hard tests and severe discipline. 

The above study inspires us to detect various characteristics of AI for populations 
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E and P, and to find some statistical criterion much more complex than that of Case 
III for analyzing the data. 

There are three methods generally used to further our research: 

1. By multivariate regression analysis. 

2. By nearest neighbour rule classification analysis of time series. 

3. Tapping the information in multivariate processes by principle component ap¬ 
proach. 

The first one is very popular in data analysis and third one will be introduced 
in sequel. The second one was introduced by Gelsch (1981). He suggested the 
following procedure for classification of time series: 

Let 

< = 1,2,... ,r, k = 1,2,... ,K 
be the k-th. sample series of population J observed, and put 

Aj = (^(i) »• • • * -^(r)) 

to denote the sample mean and covariance, where 

fc=i 

= * L( x i J) (o- 4X)(，)- 又 S )， 

k=l 

where t,j = 1,2 ,... ,T. 

Let AT( J ) represent the time series of population J. Then under the assumption 
of Gaussian distribution for the process Gelsch obtained a very useful “distance” 
function for two time series AT( 0 ) and X( m ): 

2d(X^°\X^) = log + tr r- l f 0 -trr- l (/im-Ao)(Am- A。)' (IV.6) 

No doubt, this is a very interesting function in time domain for classification of 
time series. Unfortunately, in our case, we have T* = 50 〜 100, which means that 
the matrix Lj is of the order 50 x 50 or 100 x 100, and calculating their determinant 
\Sj\ and inverse matrix Ej 1 is generally tedious. 

We shall study the problem of aviation students by the principal component 
approach. Having satisfactorily tested 101 practical samples statistically in 1987, 
we shall report the case in the following section. 


(IV.l) 

(IV.2) 

(IV.3) 

(IV.4) 

(IV.5) 
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2. Principal Component Analysis in Frequency Domain and Its Applica¬ 
tion in AI Analysis 

Four records of VEP for each student have been served by the researchers of 
Beijing Institute of Aeronautic Medicine, namely the left side VEPs and the right 
side VEPs of forehead and occipital. 

How to use these real records efficaciously is very important. Physiologists believe 
that some disparity of frequency components in VEP for two different populations 
does exists, so we preserve the research approach by spectral analysis in frequency 
domain. 

For tapping the information in the real records as much as possible, the principal 
component analysis of multivariate time series in frequency domain is used in the 
study. The basic results may be briefly introduced as in the following, readers can 
find some basic knowledge in Appendix IV of this book. 

Suppose that y[t) is an n-variate stationary series, its spectral density matrix 
exists with rank m (a.e.) for A 6 II, (0 < m < n). Let ( Mi ( 入)， M2 ( 入)，… ， Mm (入 )）， 
入 € n be the eigen values of the spectral matrix 

Without loss of generality, we may assume that 

Mi ( 久 ）2 #2( 入） 2 … 2 Mm ( 入 ） > 0 ， A G II, 

and correspondingly, their eigen vectors are 

Vi(A),v 2 (A),... ,v m (A), Aen. 

Let d7iy{\) be the stochastic measure of y ⑷， namely, we have 

y(0 = / e ixt dZ y (X), 

Jn 

and we call 

0(0 = [ e ixt \ ； {X)dZ y {\), j =1,2,... ,m, 

Jn 

the j-th principal component of y ⑴. 

For each student in aviation training center, VEP records are denoted as 

x(t) = (y ⑴ (t),y ⑺ (0 ， y( 3 )(0，y ⑷⑹， t = i,2,...,N (iv. 12 ) 

where the indices (1), (2), (3) and (4) refer to the left forehead, left occipital, 
right forehead and right occipital components respectively. To simplify calculation, 
further reducing the dimension of the vector process (IV.12) is necessary. 


(IV.7) 

(IV.8) 

(IV.9) 

(IV.10) 

(IV.ll) 
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After a statistical testing, it is found that the left hand side of forehead and occip¬ 
ital records vs. that of the right hand side is of no significant difference. Accordingly, 
we may restrict ourselves only to records of the same side, say the forehead and 
occipital of the left hand side, and make the processing for the vectors 

x(0 = (y ⑴ (0 ， y ⑺ (<))， t = i,2,...,yv. (iv.i3) 

The procedure for detecting the characteristics of the AI for two populations of 
P and E are outlined as in the following: 

Step 1. Let yg ) ⑷ be the left forehead VEP process, yg ) ⑷ the occipital VEP 
process of the same side for the i-th person in population g } where g = 1 denotes 
the “excellent” and g = 2 the “poor”. According to the upper bound frequency 

= 50 Hz for VEP selected in (III.2), we may put A = 1/{2W) = 0.01 (sec), then 
y»y(0 = 2/^(0) and digitize into 

= y(kA) = (y { i l g \kA)^ 2 g ) (kA)) i (IV.14) 

where k = 1,2,... ,51, i = 1,2,... , n g , n g denotes the total number of persons in 
population g. 

Step 2. According to the research result of Ye(l985) (see Case III), it is better 
to take a difference of the original records of VEP, i.e. to put 

Z iy (0 = + 1) -Xi 穿⑷， t = 1,2,... ,50, g = 1,2. (IV.15) 


Step 3. Using the estimation procedure for spectral density matrix by windowing 
method (see Appendix IV) 

= ifi'fW) (IV.16) 

where the elements of the spectral matrix are 

s=l — N 

jr Ellr + s)zl J g \i ) 1 s = o,i,•••, at — i, 

jf 5Zili-3 ^ig\^ + 5 )^^(0» 5 = -1,-2,... y-N -h l^rv. 18 ) 




In the general case, the window coefficients may be selected as different 

function of <s for different (k^j). For simplicity of calculation, in our case, we select 

«4 二）⑷ = ⑽⑻⑷， k| < m N , (IV.19) 



i.e. (IV.17) is having uniform windowing for different (i,j): 
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,(N) 




劈， 


0, 


|-s| < m N \ 
otherwise. 


(IV.20) 


Step 4. To normalize the eigen vectors. 

By the formula (IV.11) of principal components, we shall find the eigen values of 
the spectral matrix 


⑼ ( A) = ( l 把 I : 忠)， 


Aen, 


(IV.21) 


where i = 1,2,... , n 3 ; n g = 1,2, as in Step 3. 

Let V tg> i(A) be the first eigen vector corresponding to (see (IV.8)) Ml ( 入 ) 



v, 9ll (A) = (0)0) 

(IV.22) 

and we may make 

a transformation such that 



O) = i. 

(IV.23) 

Now, putting 

. ,、、_ iO … 

inlOl 心 

(IV.24) 

where i = 1,2,... 

,rig = 1,2, then it can be seen that 



wig(\) > o, a e n ； 



/ 入 ） dA = 1. 

Jn 

(IV.25) 


(IV.25) means that we may consider the normalized function (IV.24) as a probability 
density function for (t,^) given, which may also be considered as a characteristic of 
that student labelled 


Step 5. To find the population characteristic function. 

For ^ = 1,2, z = 1,2,... , n, we may obtain two groups of functions 


and 


{叫，1(入)， *. = 1，2，... i } 
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{^t, 2 (A),i = 1,2,... ,n 2 }, 

then the following functions 

兩 #)= 丄 ㈧ ， 

n > 

iy 2 (A) =— V] iu j2 (a), a e n, 
n 2 

J = 1 

may be considered as two “representative” functions of each population. 

Step 6. To define the Kullback-Leibler information quantities. 

For VEP records of each student y t| *(^) (* means population not yet determined), 
through the procedure steps 1-4, we may obtain a function iy t> *(A), A 6 II, and may 
define the K-L information quantities as follows 

=/K ， ,H7 1 ) = / ^i(A) log^i^dA, 

•/n W iy* l A J 

A'} = /K.,^ 2 )= [ 兩 2 (A) log (IV.28) 

Step 7. Discriminant analysis. 

For g = 1,2, randomly select m = ri 2 = 20 for learning samples, then we may 
obtain sample mean E^ g and variance matrix E g for 

^[ g) = {{A S ,l A g J)}^ 1 = 1,2,... ,20, ff = 1,2. (IV.29) 

Now, for any student y t| *(t), whether it belongs to population ^ = lory = 2is 
unknown, we may denote its K-L information characteristics by (IV.28) 

»(•) = ( ，以， 4 力， (IV-30) 

determine “ which population it comes from” by the following distance function 

z? 2 (*,i) =(»(*) — - 芯 a 1 !) 7 ; 

D 2 {*,2) =(S(.) - - E^t 2 ) T , (IV.31) 

and make a decision 

y* |t (0 6 population 分 = 1 ， when D 2 (* ， l) < D 2 {*,2)\ 

G population g = 1 、 otherwise. (IV.32) 


(IV.26) 


(IV.27) 
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3. Practical Checking 


In the first stage of our research work, we observe 40 samples and obtain the 
i(A), W 2 (A), as well as and ^ 2 )- 


=(0.5359,0.5528), 
E^ 2 =(0.5082,0.4485), 


A" 1 = 


/ 121.776 
1^-111.089 


A 一 1 


/ 81.806 
\ -90.029 


-111.089\ 
108.417 ) * 

-90.029 \ 
107.67 ) * 


(IV.33) 


(IV.34) 


For convenience of computation, we may resolve the matrices (IV.34) into the prod¬ 
ucts of two triangle matrices 


and 


i ： f 1 =L!Lf 



( 11.035 
\-10.067 


,° 66 ) 


^2 l 


l 2 


( 9.045 
I -9.954 


,° 93 )- 


Consequently we have 

D 2 (*, 1) =(»(*) — - ^»i)Li) T 

D 2 (* ， 2) =(9(*) — £3 2 )L 2 ((3(*) — £3 2 )L 2 ) t . 


(IV.35) 

(IV.36) 


(IV.37) 

(IV.38) 


(IV.39) 


Another 21 students, where populations are unknown, are used for checking our 
method. The result is: 

The correct percentage for the population ^ = 1 is P v = 86%; and for ^ = 2 is 
P 2 = 77%. 

At about the same time, many other methods based on other signal processing 
or multivariate analysis, have been worked out for the same samples and their 
correct percentage is less than 60%. For example, our research group considered 
the following 32 variates 

{Re U^(\ k ),ReU^ ( A fc )； = 0, 1， 2, ... ， 15} ， （ IV.40) 

and made the stepwise discriminant analysis with the result 

P x = 51%, P 2 = 54%. (IV.41) 
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Another stepwise discriminants analysis also carried out for the following 48 vari¬ 
ates 

{/ii(Ajk),Re/i 2 (Afc),/22(Ajfe); A: = 0,1,... ,15} (IV.42) 

where {/ tJ (A)} are elements of the matrix (IV.21), and the correct percentage is 
likewise less than 60%. 

A rough contingency analysis for our results mentioned above may be imple¬ 
mented as in Table IV.1. 


Table IV. 1 


Real 

Stat. Anal. 

9 = 1 

g = 2 


9 = 1 

n，ij : 26 

5 

31 

g = 2 

11 

19 

30 

n i 

37 

24 

n = 61 


The statistic may be chosen as 

u = ra (niin22 — ni2fl21)2 = 14.24. (IV.43) 

n\.n.\n2.n.2 

The critical value of x?-distribution for a = 0.001 is 

X?(0.001) = 10.83 < 14.24 = u (IV.44) 

which shows that the classification by principal component analysis in frequency 
domain is highly efficient . 

The second stage of our study is to extend the trial to different training centers 
in China. Eventually, we have got VEP records of 101 members in several centers, 
and the same methods are used to analyze the records with the same population 
functions as in (IV.27). Even in the situations of different conditions, different 
judgements of drill teachers, we still have the result at 

Pi = 80%; P 2 = 65.4%. (IV.45) 


4. Discussion 

An interesting thing is: one training center in Beijing wanted our research group 
to give a judgement for each of their 4 members on “what about their potentialities 
of becoming a qualified aviator upon completion of their training”. According to our 
analysis of their VEP records, our answer is “2 of them are ranked qualified and 
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the others are not”. Several months later, they came and said: w your judgement on 
those persons who are qualified is quite true, actually they are outstanding students; 
another judgement on a poor student i*s also correct but the last one is wrong. He 
is a good student, industrious in studying and training even though his response to 
each training subject is not as good as those qualified’’. 

This is a very good explanation on why our correct percentage of judgement 
P 2 is less than that of the P\. In fact, our classification basis is only on VEP 
records, i.e. our discriminant analysis is based only on the physiological nature. 
However, we all know that to judge a person by tests whether he belongs to 沒 =1 
or ^ = 2 is a comprehensive conclusion of education and is not just determined by 
his physiological characteristics. Even in graduate schools, we may see that many 
outstanding students do have slow responses in their studies, but they are very 
diligent and work hard, so that they still can get very successful achievements. 

In conclusion, we only want to say that the statistical correlation between VEP 
and AI really exists, but it does not mean a person who belongs to y = 1 will be a 
promising pilot in future and vice versa. 
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Appendix IV 


The following basic knowledge on multivariate stationary time series and principal 
component analysis in frequency domain are necessary for readers to have a well 
understanding for Case IV. All of the proofs can be found in books of Rozanov 
(1967), Brillinger (1981) and Priestley (1981). 

Definition IV.1. Let 

X(t) = (x,(0, ■•- ,x n (0) T , tez (IV.46) 

be an n-variate time series with second ordered moments 

E|ifc(^)| 2 < -foo, k = 1,2, ••- ,n, t E Z. (IV.47) 

If the following conditions are fulfilled 

Ex k {t) = a ki k = 1,2, ••- ,n; 

E[xk[t + m)xi(t)) = t,t m ^ Z; kj = 1,2, ••- ,n, 

(IV.48) 

then (X(t)} is called an n-variate stationary series, and 
a = (ai, a 2 , •- - ,a n ) T , 

R(m) = (Rki(m))i< ktl < n , meZ (IV.49) 

are called the mean and covariance matrix of {X(^)} respectively. 

For stationary series, 

R(0) > 0; 

- t 2 ) = R(^2 -«i) (IV.50) 

hold for the covariance matrix R(m). Accordingly, even in real case, 


no longer be true in general. 


一亡 2) = R (亡 2 — 之 1)， 


(IV.51) 
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Theorem IV.1. Suppose that (X(<)} is an n-variate stationary series, R(m) is 


the covariance function, then there exists an n x 

n matrix function 


F(A) = (F kj (X)) nxn , 

\eu, 

(IV.52) 

such that 



R(m) = / e ,mX rfF(A), 
Jn 

m € Z, 

(IV.53) 


where is a complex function with bounded variation and left continuous on 

[-7r ， 7r]，with F k j(-n) = 0. 


If dFkj(^) << d 入 ， then the spectral density matrix exists 


f ㈧ 


dF(X) _ fdF kj (X)\ 


A \ ^ / l<k,j<n 

Sufficient conditions for the existence of (IV.54) are 


(IV.54) 


Y2 l jR fc;( r )l < +0 °> A:,y = 1,2, ••- ,n, 


(IV.55) 


hence we have 


Ai (A) = JWA) = ^ £ Rki(r)e-^\ A G H; 


R kj {r) = Rjk(-r) = / e XTX f kj (\) d\, t e Z. 
Jn 


(IV.56) 


Theorem IV.2. Suppose that (X(t)} is an n-variate stationary series, then X(t) 
may be represented as a stochastic integral 


x(o 


iXt 


dZ x {X) t 


where 

dZ x (X) = {dZ l (X),dZ 2 {X) 1 

is an n-variate orthogonal stochastic measure 


t ^ Z, 

,dZ n (X)) T 


EZ^A^Zi^Ai) = [ dF kl (\), 


(IV.57) 


(IV.58) 
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for V A 2 G B(n), k^l — 1 ， 2, ••• ， n, and 

F(A) = (F kl (X)) nxni 入 en 
is the spectral matrix of X ⑴. 

Now, before introducing the theory of principal component analysis of time series, 
we would like to make a brief remark for understanding the nature of such theory. 

Let the observations of two random variables (fi, $ 2 ) be as in Fig. IV.3, then 
it is evident that f , f 2 ) are correlated random variables. If we make a lin¬ 

ear transformation, then the main distribution is on axis of 771 ， since we have the 
variance decomposition 



D= E(d')- 对 )) 2 + [(A) _ 对 )) 2 

t= 1 t= 1 

t= 1 1=1 

=^,1+^,2. 


(IV.59) 
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where 

>^, 2 . (IV.60) 

and perhaps 77 i has 80% of the variance D in (IV.59). In such case we may say that 
rji is the principal component of i.e. it possesses the maximum variance and the 
following result can be found in any book of multivariate analysis. 

Theorem IV.3. Let $ = ($i, * * * , (n) T be an n-variate random vector with 

mean = 0, and variance matrix U 、 then the A:-th principal component r;^ of f is 

Vk = k = ,n, (IV.61) 

where is the eigen vector corresponding to the eigen value Ajt of the matrix E, 
and C^Ck = 1, Aj > A 2 > > A n > 0. 

Corollary 1. Keep the notations as in Theorem IV.3, then 

ETir, T = (C^rC.^xn, (IV.62) 

where 

rp ( 0 , 灸 ？ ^ y; r 

C 冰七 ， k = j . (IV.63) 


Corollary 2. Consider the optimum linear regression of $ on principal component 

(” l ， ”2i … ^q)y 


where g < n, then 



/.^l l 

ai2 

... aig 〉 





^21 

«22 

... 0,2q 


* 




a n 2 

... 




=A(r7 1 , 

V2i • •. 



(IV.64) 


A = (Ci, C 2 , • • • ， C<j). 


(IV.65) 


Based on results of (IV.61)-(IV.65)，the alternative statement for the principal 
component is as follows (see Brillinger (1981)): 

Theorem IV. 4. Let X be an n-variate random variable with mean /i and covari¬ 
ance matrix Z 1 , then the optimum solution for vector V and matrices B gXn ， C nX g, 
q < n 1 which simultaneously minimize all the latent values of 


= 芯 (X - V - CBX)(X - V - CBX) t 


(IV.66) 
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are 

V = — CB/x, 

B = (VJ ，…， V,) T , 

C = B r , (IV.67) 

where Vy is the eigen vector corresponding to the j-th eigen value Hj of the matrix 
r, and Mi > M 2 > **• > Mq- 

Under (IV.67), the regression error is 

42 = (iv.68) 

Put the j-th principal component as 

P> = V ; r X, J = 1 , 2 , ••- ,n, (IV.69) 

then 

f 0 j ^ k m 

Cov(py lPk ) = { 1 j . 广， （ IV.70) 

l My, j = k. 

In time series case, the statement of the problem is to find two matrix series 
{b(u)} and (c(u)} such that 

父 ⑴ =_ u)f(u); 

U 

f(u) = ^b(t-u)X(u), (IV.71) 

t 

minimize the error 

= B{(X(t) - X(f)) r (X(t) - X(0)}. (IV.72) 

Similar to Theorem IV.4, we have 

Theorem IV.5. Suppose that {X(i)} is a stationary n-variate series, the covari¬ 
ance matrix R(u) satisfies the condition* 

^||R(u)||<+oo, (IV.73) 


•||R|| 2 = tr{RR*}, and (IV.73) ensures the existence of the spectral density matrix f(A). 
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the spectral matrix is f(A), then the optimum b(u) and c(u) for minimizing the 
error (IV.72) are 


b(u) = - 


c(u) 


2n 


C{\)e iuX d\, 


(IV.74) 


where B(A) = C •( 入 ） =(Vi(A) ， V 2 ( 入 ) ， … ,V q (A)) T , and Vy(A) is the j-th eigen 
vector corresponding to the eigen value /xy(A) of the matrix f(A), and > 

M2(A) > • • • > Mn(A) > 0. 

Moreover, the minimum error of (IV.72) may be represented as 


min A 2 = J /ij ( 入 ) d\, 

the j-th principal component of the series (X(i)} is 

& (0 = [ e at V ； (A) dZx(A), 


(IV.75) 


(IV.76) 


where dZx is the orthogonal stochastic measure of X ⑴， {Cy(0} are independent 
series, and their spectral matrix is 

f Mi(^) 0 … 0 

0 /x 2 (A) … 0 


八 ㈧ 


(IV.77) 


0 0 … /x q (A) / 

Start from N observations {X(/c), /c = 1,2, ••• , A^}, the spectral matrix f (A) may 
be estimated by the following way (see Priestley (1981)):. 

f(A) = (f kj W)i<kJ<n t (IV.78) 

where 

fkjW = ⑷今 *>( 5 ) e_< 


-ia\ 


lkj(s)= 


士 HhTi 8 x k(l + 5)x ; (/), a = 0,1,2,** ,N - 1, 

、 + 小 i (,)， 5 =-1， - 2，."，1-TV, (IV.79) 

and u^)(s) is the spectral window coefficients in time domain, it may be taken as 
a function of 5 as we have presented in Chapter 3, e.g. Bartlett, Hamming, Parzen 
windows etc. 

However, for simplified calculation, we may select 


w \Jj\ s ) = W (N) (s), 5 = 0, 士1，士 2, • • • , i(7V^ — l). 


(IV.80) 
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CASE V 


Periodicity Analysis of LH Release in Isolated 
Pituitary Gland by Hidden Frequency Analysis 


1. Introduction 


This case study reveals some interesting phenomenon in endocrinology by statis¬ 
tical analysis of stochastic processes. The problem arose from the basic research of 
physiology. Up-to-date there is a widely accepted concept that pituitary gland re¬ 
leases luteinizing hormone (LH) in pulsatile pattern through the rhythmic discharge 
of luteinizing hormone release from the brain (hypothalamus). In our research work 
some results led us to clarify the concept. 

The following experiment had been conducted in Shanghai Institute of Planned 
Parenthood Research in 1984-1986 and the statistical analysis was carried out in 
the Statistical Laboratory of Peking University.* 

The pituitary gland of a Sprague-Dawlay rat was removed and incubated in Tc- 
199 medium; the LH release of the isolated pituitary gland was investigated. 

The purpose of this study is to know if the release rhythm still occurs under such 
condition and we also want to know: is there any difference between the proestrous 
rats and the lactating rats if the rhythm really exists? 

Some of the observation curves are shown as in Fig. V.l. It is apparent that we 
could not obtain any worthwhile conclusion intuitively from these records. Indeed, 
it is neither easy to detect the characteristics of each experiment from the curves 
by the usual statistical methods. 

Fortunately, when these curves are considered as realizations of random pro¬ 
cesses, we discover, through statistical analysis of time series, some very interesting 
results — even in the case of isolated pituitary gland, the rhythm of the release still 
exists though the rhythm of lactating rats is slower than that of the proestrous rats. 


*Other researchers in the study of the case are: Professor Hsieh Chungming, Dr. Gu Peide and 
Professor Ye Kangsheng. An abstract of the research can be found in “Cybernatics & System’90*, 
edited by R.Trappl. World Scientific Press (1990). 
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Fig. V. 1. LH release curves of isolated pituitary gland 
of Sprague-Dawlay rat. 

2. Statistical Analysis of LH Release 

The first stage of our research for detecting the characteristics of those records is 
to use the linear regression analysis. Since, some features of a number of processes 
are implicated in parameters of the fitted linear regression functions. More 
specifically, these implicit characteristics of the release might be in the slopes and 
the constant terms of linear regression functions. 

Mathematically, we may assume that the linear model is 

Y = X/? + £：, (V.l) 

where Y (n x 1) is an observation, X (n x p) is a known matrix, P (p x 1) an 
unknown parameter vector, and s (n x 1) the residual with distribution N(0,o 2 I n ). 
The linear model based upon least square estimation (LSE) is 

Y = X/3 + e, (V.2) 

e = Y - X 彡， (V.3) 

4 = (V.4) 

Since we may rewrite P 


4 = /3 + (X T X)- 1 X T e, 


(V.5) 
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so 

EP = p + (X t X)- 1 X t E £： = /?, (Ee = 0). (V.6) 

It means that P is an unbiased estimate of the real /?. 

Furthermore, linear model theory shows (see Dunteman (1984)) 

0^ N ( 0iC ow(0)). (V.7) 

Particularly, for Pi we have 

4, - N(p u a^), (V.8) 

where a? = a 2 a u t and is the t-th diagonal element of (X T X) -1 . 

Now, for each LH release curve, we may fit a linear model as (V.l) and parameters 
can be estimated under LSE. 

For instance, a group of lactating Sprague-Dawlay rats with their pituitary gland 
removed and incubated in GnRH (25 ng/ml) Tc-199 medium for 30 minutes, has 
the average LH release as in Table V.l. 

Table V.l 


, Time 
(x5 mm.) 

LH 

Time 
(x5 min.) 

LH 

1 

8.93 

13 

7.65 

2 

8.92 

14 

8.52 

3 

8.13 

15 

7.62 

4 

8.62 

16 

7.22 

5 

9.32 

17 

5.48 

6 

9.50 

18 

6.22 

7 

9.35 

19. 

6.15 

8 

9.15 

20 

6.10 

9 

9.57 

21 

6.03 

10 

9.08 

22 

7.15 

11 

8.87 

23 

6.85 

12 

8.78 

24 

7.43 


The graph of the record is shown in Fig. V.2. 

Now, we put the unvariate linear model of (V.l) as 

y = 0 Q + 0'x, (V.9) 

and the LSEs are 

= -0.1412, 

bxx 

0o = y — P\X = 9.708, (V.10) 
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Fig. V.2 Average LH release record of lactating rats 

so that the linear regressive model is 

y = 9.708 -0.1412i, x = 1,2, •• ,24. (V.ll) 

In the research work, there were two different groups for analyzing, and the 
parameters of fitted linear models have values listed in Table V.2. 


Table V.2 



Sampleil 

2 

3 

4 

5 

6 

Group I 

15.462 

15.824 

11.093 

13.840 

16.895 

14.303 

Group I / 

-0.404 

-0.563 

-0.315 

-0.497 

-0.485 

-0.324 

Group II ^ 2, 

15.913 

18.410 

15.020 

14.179 

14.995 

13.584 

Group II p[ 2) 

-0.482 

-0.730 

-0.580 

-0.465 

-0.420 

-0.191 


From Table V.2 we find 


戽” = 

14.570, 

^o,l - 

= 3.411, 


处)= 

一 0.4313, 

sh : 

= 8.39 x 10 -3 ; 


时、 = 

15.350, 

^0,11 = 

= 2.40, 


p[ 2) = 

-0.478, 

^i,ii ~ 

= 2.66 x 1CT 2 . 

(V.12) 
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According to the conclusion (V.8), we know that = 0,1; j = 1,2} follow 

normal distributions. So, we may have the t-test for the means of two different 
populations, 

H 0 : /^ 1 ) = (with common variance unknown), 

then the statistic is 

T = 

which follows a ^-distribution with ni + 712 — 2 degree of freedom. 

Now, n\ = ri 2 = 6, and 

P[ l) = -0.431 ， p[ l) = -0.478; 

Sy j = 8.39 x 10 _3 ， Sl Jr = 2.66 x 10~ 2 ; 

T = 1.205. (V.13) 

Under the significance level a = 0.05, we have 

r ni+n2 —2(0.05) = 2.228, (v.14) 

since, 

T = 1.205 < 2.228 = T lo (0.05), (V.15) 

we cannot reject the hypothesis that two populations of linear regression functions 
have the same slope. 

Similar statistical test can also be carried out for the (5^ v.s. and again 

H 。： Pq 1 ^ = Pq 2 ^ cannot be rejected. 

These results show that the characteristics of such two groups are neither the 
same in the slopes {^ 1 } of LH release nor the same in the mean values of {/?o}. 

Another possibility that may be considered is the features remain under covered 
in the fluctuation periods of observation curves. Therefore, we first remove the 
trend effect of each curve from the original record by subtracting the regression 
function (V.9) at x from the corresponding LH value to obtain a comparatively 
stationary series. 

For example, we may subtract the values of the regression function (V.ll) from 
the corresponding values in Table V.l, and obtain values as in Table V.3 (see also 
the Fig. V.3). 

Then the problem can be formulated as the hidden periodicity analysis of time 
series. In this field many methods have been suggested by several statisticians and 
researchers, e.g. Grenander and Rosenblatt (1957) considered the model 

p 

x(0 = Yl Ak cos ( 山 JtO + 

k— 1 


,721?12(打1 十 打2 - 2) 


71 2 52 /7 


(V.16) 
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Table V.3 


Time 
(x5 min.) 

LH 

. Time 
(x5 min.) 

LH 

1 

-0.6426 

13 

-0.2225 

2 

•0.5110 

14 

0.7892 

3 

-1.1593 

15 

0.0309 

4 

-0.5276 

16 

-0.2275 

5 

0.3141 

17 

•1.8258 

6 

0.6358 

18 

•0.9441 

7 

0.6274 

19 

•0.8724 

8 

0.5694 

20 

-0.7807 

9 

1.1308 

21 

-0.7091 

10 

0.7825 

22 

0.5526 

11 

0.7141 

23 

0.3943 

12 

0.7658 

24 

1.1600 



Fig. V.3 After removal the trend effect represented by 
linear regression function from the LH record 

where P is assumed to be known, {Ak^k\^ = 1,2, ••- , P} are unknown parame¬ 
ters, are i.i.d. 7V(0, E) series (see §3.3.1 in Chapter 3), and the detection for 
{Akt^k} is based on the hypothesis testing. 

Pisarenko (1972) suggested a model like 

P 

y n = ^A k sin(27rn/jt + 9 k ) + w nj (V.17) 

k=l 
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where 


9k ~ i.i.d. C/( 一 7r ， 7r )， (V.18) 

u» n 〜 i.i d. N(0, E). (V.19) 

In China, there are many time series analysts who have investigated the hidden 
periodicity analysis problem in the last decade and several important results have 
been obtained (see Chen Xie (1989) or An Xie (1991)). These results widely 
generalize the model of (V.16) and (V.17). 

For example, He. S. Y (1987) has investigated the model as 

p 

At) = ^ + n(t), tez, (V.20) 

k=l 

where {^ n } are bounded random variables, P is an unknown parameter and rj(t) is 
a weak - 尸 model so that the correlated stationary series may be included (see §3.3.1 ， 
Chapter 3). He has given an algorithm for detecting the hidden frequencies {A n } 
and the amplitudes {A n } (we will briefly call it HSY algorithm in the sequel). 

However, HSY algorithm is based on the asymptotic theory of large samples of 
time series and in practical application, one is often bothered by choosing a suitable 
value 7. Since, we know that He defines a statistic 


N 2 

JnW = N~ 31/16 J2 x(n)e- iXn , A e IL (V.21) 

n = l 

where N is the sample size, and the estimation of the order P for (V.20) is to find, 
roughly speaking, the number of events 

NBfc = {A:|A-A Jt |< 6 , J N (\) > 7 }, A: = 1,2,… ， P ， （ V.22) 

for given <5 > 0 (see Chapter 3). 

Since in practical problem, the sample size N usually is determined, so evidently 
the value of 7 could not be set arbitrarily as that in the theory. In fact, { 心 /( 入）〉 7} 
will not happen when 7 is chosen too large, and conversely, when 7 is selected to be 
a rather small number, then th^i P of (V.22) may be overestimated since too many 
pseudo peaks of {A/^} will appear (see Fig. V.4). 

In practice, we often choose 7 as 

1 4 ( 士 Z^ i2 ( n )) ， A ~ 01 ( v . 23 ) 

(see Chen & Xie (1989)). 
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Fig. V.4 Pn,i — ° ^ or 7i and P//.2 > P f° r 72 

For small or medium size samples, up-to-date, no satisfactory methods have been 
found for detecting hidden frequencies. 

In comparing the merits of these methods, we find that some of the real frequen¬ 
cies are often not included in the test of Grenander & Rosenblatt and one of the 
merits for the methods of HSY is that all of the real frequencies R = {Afc} will be 
contained in the set of estimated frequencies 

R = {Ay : j = 1,2,P} . (V.24) 

namely, 

RCR (V.25) 

for N not sufficiently' large. Perhaps, a combination of several methods (hybrid 
style) is available for applications, e.g. based upon (V.24) by HSY algorithm, we 
may make further selection in R for reasonable hidden periodicity estimates. 


3. Practical Rhythm Analysis of LH Release 

There are four groups of experiments for the investigation of rhythm analysis of 
LH release for isolated pituitary glands of Sprague-Dawlay rats: 

Group A. Proestrous rats. 

Group B. Lactating rats. 

Group C. Overiectomized rats. 

Group D. Overiectomized rats, their pituitaries are incubated in GnRH (25 
ng/ml) Tc-199 medium for 30 minutes. 

The LH release is recorded for 120 minutes for each rat and the following statis¬ 
tical analysis is carried out for the observed data. 
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Step 1. Adjust off the trend T(t) by the linear regression function (V.9), say 


T ⑴ =Po + Pit, 


(V.26) 


and a new series 

z(t) = x(t) - r(0, t = 1 , 2 ,..., AT (V.27) 

may be obtained, where re ⑴ is the original record. 

Step 2. Consider the z(t) as a weak-P linear model (V.20) 


where 


2 ⑴ = Yl Ak cos ($ + 0n ) + ”⑷， 


■P ， {Tn ， 沒 n ， X n ，fl = 1 ， 2,… ，尸 } 


(V.28) 


(V.29) 


are unknown parameters, rj(t) is a stationary correlated residual series satisfying 
the condition of weak-P (see Chapter 3). 

Then, we may use the hidden periodicity analysis for the series z(t), t = 1,2, … ， N. 
The algorithm is hybrid, i.e. which is based on HSY then connected to other 
algorithms. 

The main results for four groups of experiments (A)-(D) are as follows: 

(1) . For group A, i.e. proestrous rats, the average rhythm period is T = 27 
(minute). 

(2) . For group B, i.e. lactating rats the average rhythm period is T = 61 (minute). 

(3) . The average periods of groups C and D and medium values in comparison 
to that of groups A, B. 

These results show that even in the case of isolated pituitary gland the rhythm 
of the LH release still exists. The interesting thing is that the rhythm of lactating 
rats is much slower than that of proestrous rats. 

(4) . Another interesting thing is when we put all of the detected vectors 


{^Vi ，乂 n，n = 1,2, P } 


for four groups of A, B, C and D together, such as 
Group A: (27,1.53), … ， (40,2.22). 

Group B: (80,1.07), ..., (40,0.82). 

Group C: (34,1.01), ..., (48,1.87). 

Group D: (27,1.65), (40,1.52). 

then we may have the figures V.5-V.6, which show that groups A and B are dis¬ 
tributed in two distinct regions (I) and (II), but group C and D are ambiguous on 
the plane. 



o 


20 


40 


60 


80 T(min) 


Fig. V.6 The distribution of detected vectors for group 
C vs. group D 

4. Discussion 

The real function of the intrinsic pituitary rhythm in the body is unknown, but 
the frequencies obtained from the experiment seems to match the different es- 
trous phases during the reproductive activity in intact animal. For instance, the 
high frequency of LH release from proestrous rat fitted the need for the huge re¬ 
lease of LH, so called LH surge at the proestrous phase, in intact animal; the 
lactating rat had a predominant prolactin level, which might reduce the LH re¬ 
lease, in other words, in such circumstances the pituitary gland could modulate 
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its frequency to lower the LH output. In case of ovariectomized rat the negative 
feedback system in pituitary gland was abnormal, so its frequencies of LH release 
and those response to LHRH stimulation were in a muddle. 

The above statement is only an imagination of the phenomena in relation to the 
LH frequency of LH release between intact animal and incubation system，which is 
certainly too simple to explain such complex physiological events, the detail intrinsic 
rhythmic function and mechanisms remain to be solved. 

Now, compare two samples zp(t) and of group A and B which are shown 

in Fig. V.7 and V.3. 



Fig. V.7 A record of zp(t) = xp(t) — Tp(t) 

It is apparent that, even intuitively, the curve in Fig. V.7 has more high frequen¬ 
cies and Fig. V.3 mainly has a strong low frequency component. 

Other spectral analysis methods had been carried out also for comparison. For 
proestrous rats, the data of Fig. V.7 are listed in Table V.4. 

Spectral estimates with Parzen and Hamming windows (see Chapter 3) are shown 
in Fig. V.8 and V.9. 

The first peak in Fig. V.8 is 


T 上 ” = 80(min); 


f 


2?r 、 


= 0.174. 


The second peak is 


= 40(min); 



= 0.151 ， 



Fig. V.8 Spectral estimates for zp (t) with Parzen 
dow 


and the third is 


Tp 3 ^ = 16.8(min); 


’( 参 ) =0 . 146 . 


The first peak in Fig. V.9 is 
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Fig. V.9 Spectral estimates for zp (t) with Hamming 
window 


The second peak is 


Tjp = 36(min); 



= 0.133, 


and the third is 


Tjf) = 16.8(min); 



= 0.107 


Similarly, for the zi(t) of lactating rats, we may also calculate its spectrum by 
window spectral estimations of Parzen and Hamming. The results are shown in 
Fig. V.10 and V.ll. 

Both of the estimations by Parzen and Hamming have only one peak that is : 


Tp = 80(min); 
Th = 80(min); 


/(g) = 0.605 
/ ⑵ = 0. 721 


We may see that spectral estimation by Parzen and Hamming windows have the 
same conclusion, that is, there are three frequencies components in the record of 
Zp(t) and mainly one low frequency component is included in Z^lt) even though 
there are some slight difference between them. 

Another calculation is to compare M.E. (maximum entropy) spectral estimate 
(see Chapter 2). For the record of 々 ⑷， the M.E. estimation is carried out by 
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Fig. V.10 Spectral estimates by Parzen window for LH 
release of lactating rats. 



Fig. V.ll Spectral estimates by Hamming window for 
lactaing rat. 


Marple algorithm and with HIC order selection (see (2.164)，Chapter 2). The fitted 
model is 


z(n) - 0.5762(n 一 1) — 0.1962(n — 2) + 0.135 之 (n — 3) 

- 0.2892(n - 4) + 0.5102(n - 5) = 0.438g ： (n) 


and the spectral estimates is shown in Fig. V.12. 



192 



Fig. V.12 Spectral estimates by M.E. with Marple al¬ 
gorithm and HIC order selection 

The peak is 

r^.=80(min); f[^Tj = 0824 > 

which is almost the same as that of the window spectral estimation. 
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CASE VI 

Statistical Detection of Uranian Ring Signals 
from the Light Curve of Photoelectric Observation* 


1. Introduction 

The discovery that Uranus has rings is a very great event for the recognition 
of our solar system. Since the light of Uranian rings is very very weak, none has 
ever seen Uranian rings even with the help of the most powerful telescope available. 
However, on March 10, 1977, an unusual event happened: scientists of the United 
States, China, India, South Africa, etc. discovered unexpectedly that Uranus has 
rings just as Saturn has, from the occultation of the star, SAO 158687 (spectrum 
Ks, visual magnitude 8 m .8) by Uranus. 

The eminent observation has been taken by Elliot et al. in 1977 at KAO (Kuiper 
Airborne Observatory), their records show (see Fig. VI.l) that Uranus has five 
rings which are named on the basis of their distance from the center of the Uranus, 
a, 7, 6 and e respectively. 

As is well known, the observation of such an event was never made before 1977, 
since it would provide good information about the diameter and atmosphere of the 
planet, Chinese researchers decided to observe it in March 1977. The observation 
were prepared to be done both in Nanjing and Beijing. Unfortunately, the obser¬ 
vation was frustrated by heavy rain in Nanjing in the night of March 10, while in 
Beijing, the observers from both the Purple Mountain Observatory and the Peking 
Observatory successfully saw the phenomenon. Although an occultation of the star 
by the Uranus itself was not detected in Beijing, several secondary occultations were 
observed unexpectedly. The obvious deep drop in light curve showed that it was 


•In this Case, we shall briefly describe the discovery of Uranian rings by statistical analysis in 
China in 1977-1980. As we know, the Voyager 2 photographed the rings of Uranus several years 
ago (see Murray & Thompson (1990) and Fig. VI.7) so that the information on the rings are now 
clearer than before. Since our research work in this Case is a historical review of an event of the 
past, so we mainly will preserve the results as in the original reports. The later papers can be 
found in Chen et al. (1978), (1980), Xie and Cheng (1983). 
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Min after 20 hur 
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46 47 48 49 50 51 52 53 54 55 

min after 21 hut 


Fig. VI. 1 Light curves showing the five brief occulta- 
tions observed both before and after the occultation of 
SAO 158687 by Uranus on 10 March 1977 (from Elliot 
J.L. et al. Nature 267 26 May 1977). 

caused by the occultation of the main ring. The signals of short occultation by the 
other rings were detected from noise by statistical methods in Peking University. 
The statistical analysis is based upon a real correct observation record. 

(1) . The equipment used. A 60 cm reflector with Cassegrain focal length 9 meters; 
a photoelectric system composed of an E.M.I. 9558 QB tube mounted behind an 
interference filter centered at 7,200 人 with half-width 140 A. The characteristics of 
the recording system are: recording speed, lOcm/min; time accuracy, within 0 a .5. 

(2) . Observation procedure. Data recording commenced at 19 /l 47 m 13 tf U.T. and 
ended at 21 /l 50 m 30 a U.T.. At the beginning of the observation we measured the 
intensity of the Uranus and occulted star respectively when they could be separated 
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in the diaphragm. 

In the process of observation, we measured carefully the skylight and dark current, 
and after the setting in of the astronomical twilight, we measured the skylight 
frequently. 

(3). Observation results: two distinct secondary occultations were observed and 
they corresponded to two occultations by the e-ring (see Fig. VI.2a and Fig. VI.2b). 
The midtime of the first occultation is 20 /l 18 m 41 a U.T. on March 10, with duration 
of 9 s . The possible second occultation began at 21 ,l 47 m 36 a U.T., the record being 
interrupted by center-checking after 8' The scales of figures VI.2 and VI.3 are 
different. In order to reduce them to the same scale, all the values of intensity in 
Fig. VI.2b must be multiplied by a factor of 1.6. The values of time shown in all 
figures should be corrected for AT = \ a .2. 



Fig. VI.2a The first occultation was observed for the 
main ring. 

A luminosity drop with a duration of about 14 min centering at 21 /l 01 m and a 
maximum drop of 0 m .05 were also recorded. 

Noticing that the intensity of the occulted star amounted to about 21% of the 
total intensity of the occulting object and the object occulted, we obtained the 
optical thickness T\ = 1 from the pre-immersion occultation of the main ring, and 
the optical thickness corresponding to the possible post-emersion occultation event 
by the main ring amounted to only t\ = 0.4, which was very different from the 
value of the first event. 
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Fig. VI.2b The possible second occultation was ob¬ 
served for the main ring. 

From the observational results of Elliot et al. in KAO, we know that there are 
rings other than the e-ring (i.e. the main ring). Signals of these rings, perhaps should 
have been included in our record. But owing to the fact that we have no near infra¬ 
red equipment and suitable filter, and the photometer used was only conventional, 
the SNR values of these rings are comparatively small. The signals were confused 
in noise (see Fig. VI.3). They could not be detected by simple methods, several 
numerical analysis such as t-test, analysis of variance, etc. had been carried out for 
the observed data but failed. 


2. Statistical detection of weak ring signals from the noise background 

In order to obtain a real and good result, the most important thing is to know, if 
those weak ring signals really exist in our record, since statistical detection is based 
on reality. 

Astronomers confirmed that those ring signals should have been there on two 
accounts: 

(1) . The main ring signal has appeared on the record, and the observation is a 
continuous process so other ring signals must be recorded too. Since the SNR is 
very small so that those ring signals are not strong enough for visual exposure. 

(2) . Some experiments have been repeated with the same recording system in 
Peking Observatory for different luminosities and they all appear in the records but 
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Fig. VI.3 The observation record of weak ring signals 

some of the signals are difficult to discriminate because of low luminosities. 

According to our experience in the study of Case I, for the convenience of sta¬ 
tistical analysis, it is necessary to know “what is the signal and what is the noise?” 
Is the signal a stochastic process? Is the noise a Gaussian noise? and so on. We 
are required to obtain some records of the noise and the signal which were undis- 
t or ted by the previous one for recognizing their characteristics for further statistical 
analysis. 


A. Signal analysis. According to our requirements some of the signals recorded 
showed that they could not be stochastic processes and possess an almost definite 
wave form, which may be approximately represented as 


•s ⑴ 


,o(l - e _< / r ), t < To, 

I 0 (l — e- T °/ r )e- (t - To ^ T , t > T 0j 


(VI.l) 


where /o denotes the amplitude of a rectangular pulse before passing through the 
instrument, T is both the duration of a signal, and the time constant (corresponding 
to our case, T=RC=1). 

In the following analysis, for the convenience of computation, the signal function 
of (VI.l) is digitized, the sampling interval adopted is 0.25, so that the wave form 
of the signal can be represented as (see Fig. VI.4) 


s(k) 


/o(l - e- r °/ T ) 

7 0 (1 - e -( 4 _*/ 4 )/ T )， 


(4-r 0 -J:/4)/T 


k < 
k > 


4-To 
0.25 ’ 
4-Tp 
0.25 ' 


(VI.2) 
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Fig. VI.4 The normal signal function 


In order to determine the duration value of those weak signals, a comparatively 
clear signal (a-ring) has been used for detecting the value To in the interval [0.25, 
1.50] by maximum likelihood function, 


孔⑷ 

0.50 

0.70 

1.00 

1.25 

1.50 

啊） 

13.97 

13.27 

17.48 

15.67 

14.30 


so we selected the duration To = 1.00 for the analysis and this result is consistent 
with the observation of KAO. 

B. Noise analysis. Based on the practical analysis and checking, the recording 
system confirmed that the noise on the record chiefly comes from two sources，one 
was the heat noise of the instruments, the other the twinkling of stars. Since the 
recording system was in stable situation, we may assume the process of the noise to 
be a stationary process with Gaussian distribution. As a matter of fact, fortunately 
we obtained some segments of the record from the observation where before the 
occultation of SAO 158687 by Uranus rings happened. Its correlation function 
(normalized) are listed in Table VI.1 and the figure is shown in Fig. VI.5. 

For the purpose of identifying whether the noise is a white noise, we may use 
the order determination of AIC, BIC and HIC (see §2.5.2，Chapter 2) with the 
algorithm of Levinson (Theorem 2.7 chapter 2). By (2.87) of Theorem 2.9, we have: 


(a(p )\ 2 — detR p +i 

V° / detRp • 


(VI.3) 
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Table VI. 1 


k 

R(k) 

k 

R(k) 

0 

1.00 

8 

-0.041 

1 

0.761 

9 

-0.010 

2 

0.521 

10 

0.010 

3 

0.309 

11 

0.015 

4 

0.153 

12 

-0.007 

5 

0.054 

13 

-0.026 

6 

-0.010 

14 

0.007 

7 

-0.054 

15 

0.025 



Since 


detRi =1.00 

K )) 2 

= 1.00 

detR-2 =0.4208 

OT 2 

=0.4208 

detR 3 =0.1738 


=0.4130 

detR 4 =0.0712 


=0.4097 


The order determination by different criteria is as in Table VI.2. 

We may see that different criteria give the same result, that is: the noise cannot 
be the i.i.d. white noise and an AR(1) model is fitted 


n ⑴ + 0.761n(i - 1) = 0.6486e: ⑴. 


(VI.4) 
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Table VI.2 


order s 

AIC(5) 

BIC(j) 

HIC ⑷ 

0 

0 

0 

0 

1 

-0.8343 

-0.8006 

-0.7988 

2 

-0.8218 

-0.7543 

-0.7506 

3 

•0.7986 

-0.6974 

-0.6919 


C. Ring signal detection by hypothesis testing. Based on the preceding re¬ 
sults we may consider the series of the observation record as the model 

y(t) = s(t) + n ⑷， (VI.5) 

where s ⑴ is a deterministic signal with the known shape of (VI.2), and the noise 
n(t) is a Gaussian stationary series with correlation function as listed in Table VI.1. 
Then, we consider the signal detection of the rings by the following test: 


H 0 : y{t) =n(t), I <t < N] 

Hi : y(0 =n(t) -f s(t), I <t < N. 

The testing statistic is 

f = 丄， 

A 


(VI.6) 

(VI.7) 


where f = J2k=\ S (“M “)- 

Under the hypothesis H 0 , C is a random variable with Gaussian distribution 
N(0 ) ) where 

N 

芯 y (“ ）= 0 

k—\ 

N N 


(VI.8) shows that 


E ^ 2 

t'=l j=l 
N N 

= s{ti)s(tj)R nn (ti — tj). 


(“) 


c = 


N N 




(VI.8) 


(VI.9) 




201 


is a N(O y l) random variable free from unknown parameters. 

According to the representation of the signal (see Fig. VI.4) the statistic ( can 
only appear at the positive side if Ho is rejected since s(t{) > 0. Therefore, we 
select the critical region for rejecting H 0 is 

{c> M, (vi.io) 

under the significance lever a, where u a can be read from the normal distribution 
table. 

We select a = 0.01, then u a = 2.32, and (J ? = 0.643 so that the rejection region 
is 

Ro.oi = {c > u Q o- f } = U > 1-49}. (vi.ll) 

In order to make a further confirmation of Ho for the appearance of «s(i) after H 0 
is rejected, we introduce the second statistic 

N 

p= _ “ (VI.12) 

\ EyW.) 

\ 1=1 \ 1=1 

By Schwartz inequality, we know that 

0<H<1, (VI.13) 

and p = 1 when y(t) is proportional to s ⑴ _ In the testing, we require that p > 0.85 
for accepting the existence of ring signals. 

In brief, acceptance of the H i is in the region of 

R = {^ > 1.49; p > 0.85}. (VI.14) 

The astronomers of Nanjing and Peking Observatories selected 31 segments of 
data from the original record, where 120 sampling data are included in each segment 
and numbered the segments as 1,2,... , 31. In order to eliminate any subjective 
influence, the working procedure was defined as: 

1. Astronomers in Beijing offered the data segments, which had been numbered, 
to statisticians of Peking University without any further information except data. 

2. Statistical research for detecting the ring signals was carried out in the Com¬ 
putation Center of Peking University with DJS-18 computer (made by Peking Uni¬ 
versity) . 

As soon as a ring signal was detected, then we reported a vector (N } T) to as¬ 
tronomers. N was the number of segments, T the position at which the maximum 
value of the signal we had detected. 
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3. Calculation of the astronomy time and position, etc. was conducted in Nanjing 
Observatory and was sent back to Beijing upon completion. 

Altogether there were 3720 points that had been investigated by us, and 12 
useful pieces of information were detected. Besides the 5 rings a, 7 , 8 and e 
which were reported by KAO we obtained additional messages and named by us as 
A, /i, v. Since the position of A, // are fairly symmetric with respect to the Center 
of Uranus, we considered the probably the 6 th ring of the Uranus (see Chen et al. 
(1978)). 

D. Statistical estimation on the amplitudes of signals. After accepting the 
Hi, i.e. ring signal has been identified in the record, then another important problem 
is to estimate the amplitude Iq in (VI.2), since it relates to the optical thickness 
of the rings. Based on the acceptance of Hi, we may assume the model of our 
observation to be 

y(t) = n(i) 4 - Ios[t), (VI.15) 

where Iq is the unknown parameter when we normalized the maximum value of s(t) 
to be 1 . 

Now, as the distribution of n ⑷ is Gaussian and stationary so the combined 
probability density of Y = (y ( 亡 】）， 3 /(^ 2 ) ， … ， y(〜））is 


P(yi ， y 2 , … ,yyv) = (2?r) — 夸 （ detR^) —* exp |-^(Y T - / 0 S r )R~^(Y - / 0 S)| , 

(VI.16) 

where 



(Rn 

R\2 

• • • R\n \ 




丑 21 

丑 22 

… ^2N 



Rnn = 





(VI.17) 


V Rni 

RN2 

… Rnn ) 



Rij =Rnn{U - 

tjh 


… y N. 

(VI.18) 


The log likelihood function is 

log L(Io) = log C - i(Y r - 7 0 S t )R-„ 1 (Y - / 0 S), 
and the MLE of Iq is therefore 


io 


s T R^Y 


where 


sus’ 

s = .. ,s(t N )) T . 


(VI. 19) 

(VI.20) 

(VI.21) 
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All of the statistical detection on the ring signals are listed in Tables VI.3 and 
VI.4. 


Table VI.3 


Pre-immersion a — 0.643 


Ring 

P 


tp 

tc 

D 

a 

92.3% 

2.37 

20 h 29 m 54 # 

20 h 29 m 49* 

1.00 


88.3% 

1.50 

20 28 01 

20 28 03 

0.51 

7 

84.7% 

1.67 

20 24 28 

20 24 38 

0.58 

6 

92.8% 

1.53 

20 23 28 

20 23 29 

0.66 

e 





1.40 


Post-emersion o = 0.643 


Ring 

P 

C 

l D 

tc 

D 

a 

95.1% 

2.33 

2l h 37 m Z4 e 

21 h 37 m 24 - 

0.64 

P 

— 

- 

- 

- 

一 

7 

86.5% 

1.49 

21 42 28 

21 42 35 

0.69 

6 

88.3% 

2.64 

21 43 51 

21 43 44 

0.97 

e 





1.82 


Table VI.4 


Signal A, /i and v 


Signal 

P 

C 


1c 

A 

88.9% 

1.74 

20 ,l 34 m ll* 

42,570 km 


91.2% 

1.60 

21 33 26 

42,780 km 

u 

91.6% 

1.69 

21 41 08 

47,050 km 


where 7 C in Table VI.4 is the computational position, i.e. the distance of signal from 
the center of the Uranus. 

The following figures (Fig. VI.6(a)-(b)) highlight a comprehensive picture of the 
observations of occultation event in 1977-1978. 
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^ C10 a km) 

Fig. VI. 6 (a) Observation points of occultation in the 
sky 


3. Discussion 


The event of the discovery of rings of the Uranus in 1977 happened more than a 
decade ago. Particularly, the spacecraft Voyager 2 imaged the planet Uranus several 
years ago, the information on the rings of Uranus is much clearer than before. The 
main purpose of this case study, of course is not to declare some new results in 
subject, the author only wants to show that statistical analysis is a very powerful 
tool in science investigation. 

Indeed, some scientists seem to have some misunderstanding or prejudice against 
statistics. They say that statistics can only offer quantitative but intuitively deter¬ 
mined results and can never be used to discover any new phenomenon. Our research 
in this case really discovered important event in the solar system. We have shown 
that besides the 5 rings of Uranus reported by Elliot et al. (we checked that was 
true), we also discovered the existence of a ring inside the a-ring (named X-fj. by us) 
by our statistical method in the fall of 1977. Moreover, we further reported that 
the signal u in our detection, is “situated between and 7 rings, it may be a part of 
some broken ring or some other obscuring matters”. These results now have been 
checked by the image of Voyager 2 (see Chen et al. (1978) and Fig. VI.7). 

In connecting the preceding image with the following Table VI.5 (see S. F. Der- 
mott, Phil. Trans. R. Soc. London, A 303 (1981)), we may see that our \-fi ring 
is just the ring 4. In the image of Voyager 2 we may also see that there are other 
matters located inbetween 0 and r] rings, and the signal v (47,050 km) which we 
detected just situated in that area so quite likely v is their signal. 

Unfortunately, we could not find the weak ring signals 5 and 6 in our detection. 
In the presence of such a strong noise interference, we cannot help select our sig¬ 
nificance level of the test to be cv = 0.01 in order to strengthen our argument. It is 



60 h N 
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Fig. VI.6(b) From Bhattacharyya et al. MOON 
PLANET. 21.1977.393 

very strong for rejecting Ho in our hypothesis testing (actually, a is less than 0.01 
since the statistic p has been introduced in the test). As we know in statistics, too 
tight a significance level for the testing will increase the error /?, i.e. some of the 
observations, in which signals are included will be considered as noise. 

Another two signals of our detection had been rejected by Prof. Chen, Daohan, a 
researcher of Nanjing Observatory in 1980 based on astronomy background. After 
we got the image of Voyager 2, we agree what he is correct. However, it is not at 
all surprising from statistical point of view, because even though we select a= 0 . 01 , 
but with 3720 detections for the signals, some error decisions are unavoidable. This 
teaches us a very important lesson, that is, statisticians or applied mathematicians 
should discuss their research findings with experts in the field of investigation and 
respect their advice as well as their opinion. 


20fe 


rings of Uranus taken using a 96-s exposure at a phase 
angle of 172.49. (From Carl D. Murray & Robert P. 
Thompson. NATURE. Vol.348. 6 Dec. 1990) 


Table VI.5 


Ring 

Distance (km) 

e 

51181.7 士 33.3 

6 

48333.9 土 32.6 

1 

47657.3 ± 32.5 

V 

47208.9 ± 32.5 

P 

45695.6 ± 32.4 

Q 

44752.3 土 32.4 

4 

42600.1 土 32.3 

5 

42272.0 土 32.2 

6 

41865.5 土 32.1 
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CASE VII 
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On the Forecasting of Freight Transportation 
by a New Model Fitting Procedure of Time Series* 


1. Introduction 


Up-to-now，the railway transportation in China is still lagging far behind the 
public demand and every train is over-loaded with passengers and cargoes, though 
the railway reconstruction has had a great development since 1949. How to increase 
the transportation capability further has for many years been the serious problem 
concerning everyone working on the railway. At present for China, the best solution 
is to make use of the current transportation facilities as much as possible. For the 
freight transportation, for example, the cargo carts are not all utilized, since the 
yard marshals often do not know how many carts they need to transport the total 
amount of cargoes in a medium length or a long term period, say one month or 
six months. Sometimes, some stations are short of cargo carts to transport cargoes 
commissioned, while many such carts are running idle in other station and yard 
marshals can only make plans a week ahead of schedule from their experiences but 
not longer. 

The difficult problem for making a medium or long term prediction is that the 
real transport amount in the record is not a deterministic function but a random 
process. Fig. VII.1 is a real record of the freight transportation in a station of south 
China (we shall call it the “Z-station” hereafter). 

Evidently, making a forecast for such a complicated process is not so easy. 

In time series analysis, a very famous and widely spread method for forecasting 
is the X-ll procedure (see A.l in Appendix VII of this case study and the related 
papers). 

In X-ll program, a model with mixed components of stochastic processes are 


•The original problem of this case was suggested by Professors Hou Zhenting and Xie Shihao of 
Changsha Institute of Railway, Hunan Province, China. 
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Fig. VII. 1 The freight transport record in a station 
during the period 1975-1985 in south China 

considered, i.e. the observation z[t) can be represented as 

x(t) =T(0 +5(0 4-/(0, (VIM) 

where T* ⑴， S(t) and J ⑴ are the trend, seasonal and irregular components respec¬ 
tively. 

Using a series of filtrations (see A.l in Appendix VII) X-ll can offer the esti¬ 
mations of trend Tn, seasonal component S\ \ (t) and In(t). Reasons for the wide 
spread of X-ll in practical researches are, such method are simple for calculation 
and sometimes also can offer very satisfactory results for the forecasting problem, 
moreover it does not require knowledge of models or stationarity of the data, etc. 

Based on 102 observation data of the Z-station, the trend, seasonal and irregular 
components decomposed by X-ll are shown in Fig. VII.2, VII.3 and VII.4. 

Using the X-ll procedure for these data, we may obtain 10 (month) step forward 
predictions, which are listed in Table VII.1. 

The average error percentage AEP=6.85%. 

The definition of AEP is 


AEP = 


J_ v 2 ' - £ (fc)l 

15 一 兩厂 


(VII.2) 


where x[k) is the real data and x(k) the predicted value by X-.ll. 

Generally speaking, the result for such a complicated process is not bad at all, 
the AEP=6.85% is good enough for application in many areas. Unfortunately, such 



Fig. VII.3 Seasonal component of freight transportation 
decomposed by X-ll 

result could not be accepted by the yard marshals, since their point is to consider 
not only the average predicting error percentage but also the accuracy of the forecast 
of each extreme value. In Table VII.1 we may see that there are large prediction 
errors for A: = 4 and A: = 5. 
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Fig. VII.4 Irregular component of freight transporta¬ 
tion decomposed by X-ll 


Table VII. 1 


Step k 

real value 

Pred. value 

Err. perct. 

1 

15537 

16133 

3.8% 

2 

15992 

15559 

2.7% 

3 

16945 

15658 

7.6% 

4 

19391 

16180 

16.5% 

5 

20182 

16479 

18.3% 

6 

16861 

15774 

6.4% 

7 

15894 

15567 

2.1% 

8 

16874 

15971 

5.3% 

9 

16103 

15836 

1.6% 

10 

16227 

15531 

4.2% 


Other researchers also suggested us the following method: 

a. To remove the trend T(t) by regression method. 

b. Putting y{t) = x(i) — T(t), then to fit a model Z(t) by M.E. fitting procedure 
which we have introduced in Chapter 2. 
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c. To make the A:-step forward forecasting Z t (k) for the AR(P) model which is 
based on the data (x(s),s < i} (see Chapter 3). 

Then the total prediction is 

x(t + k) = f ⑷ + Z t {k), k = 1,2,,.. ,10. (VII.3) 

Accordingly, we use the preceding procedure for our data and obtain 

x(s) = 11970 + 40.943 + Z(s), (VII.4) 

where Z[t) is AR(6), fitted by the algorithm of Marple (1980) and with order 
selection by BIC，satisfying the equation 

Z(s) = 0.3258Z(5 - l)+0.0009Z(s - 2) + 0.015Z(s - 3) + 0.1207^(5-4) 

-0.078Z(5 — 5) - 0.2581Z(5 — 6) + e[s). (VII.5) 

The 10 step predictions of (VII.3) are listed in Table VII.2. 


Table VII.2 


Step k 

real value 

Pred. value 

Err. perct. 

1 

15817 

15472 

2.18% 

2 

16984 

16027 

5.63% 

3 

21089 

16047 

23.9% 

4 

17917 

15801 

11.8% 

5 

16029 

15728 

1.87% 

6 

16317 

15962 

2.17% 

7 

15711 

15803 

0.58% 

8 

14528 

15600 

7.3% 

9 

17701 

15578 

12.0% 

10 

16075 

15709 

2.27% 

11 

15537 

15768 

1.48% 

12 

15992 

15746 

1.54% 


AEP=6.06% 


The same problem still exists in Table VII.2 that the prediction error for the 
extreme value A: = 3 is still too large. 
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2. A new model fitting procedure for freight transportation prediction 

No doubt, three components T(t)\ 5(^) and I[t) compacted by X-ll are reason¬ 
able and a broad field of practical problems can be included in such model. The 
problem rs how to detect those components more accurately for forecasting. 

Recall that in our study of Case I， we have studied the optimum filtration problem 
for subtracting a slow varying component by Chebyshev filter, which can be designed 
for an arbitrary truncated frequency a, and possesses very good property outside 
the [a, n] area. 

In Case V, we have introduced the HSY algorithm for detecting the frequency 
component for a weak-P model, which needs not assume the numbers of periods to 
be known in advance. 

According to the preceding results, we naturally will consider a new procedure 
for decomposing the trend component T(t) and seasonal component 5(i) (now, it 
should be understood as periodic component). The procedure is as in the following: 


(1). Trend component removal. 


In Case I, we have obtained the OCFF as (see (1.70)) 
f 6 cos (TV cos -1 <^(A)), 

f I (彡 ( 久 ) + \/V» 2 (A) - 1) + (彡 ⑷- y/(t> 2 {\) 


iT(A) 


where 


m 


2C\ — oc -f-. 


a < A < 7r; 

0 < A < a, 
(VII.6) 

(VII.7) 


with a = cos a, and C\ = cos 入 ， is the truncated frequency for filtration, and 

6 = Max^|i/*(A)|. (VII.8) 

Now, according to our situation [A = 1 month), we may consider the trend 
component which exhibits a variation not shorter than 12 months in period, so the 
following parameters are selected 

cv = 0.5236, (VII.9) 

a = cos a = 0.86602. (VII.10) 

Again from (1.76) in Case I， we know that the term N for the filter can be 
determined by the following inequality 


log? 


N > 


(VII.ll) 
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where 


l+a 


1.1435, select 6 = 0.001, then 


xr 7.6009 ~ 

N > - = 14.35. 

0.5295 

By (VII.12) we may select TV = 15 for the filtration. 

Put f 

2irk 

Ajt = 7 - r, A: = 0, 士1，士 2, • • •，士 

* (2N + 1) 

into (VII.6), the filtering coefficients in time domain are 

= WTi ^(0) + 2±H^k) COS ^J^y 


(VII.12) 


(VII.13) 


(VII.14) 


where /x = 0, 土 1, 土2,…， ±N, H*(k) = H*(Xk). {/i(m)} and {H*(k)} are listed in 
Table VII.3. 


Table VII.3 


k 

H(k) 

m 

k 

H(k) 

明 

0 

1408.9 

0.1322 

8 

-0.2593 

0.0467 

1 

758.32 

0.1301 

9 

•0.1682 

0.0384 

2 

75.22 

0.1249 

10 

0.1063 

0.0277 

3 

-0.0185 

0.1163 

11 

-0.0672 

0.0190 

4 

-0.9995 

0.1051 

12 

0.0755 

0.0122 

5 

0.8100 

0.0922 

13 

-0.0209 

0.0072 

6 

-0.5792 

0.0782 

14 

0.0102 

0.0037 

7 

0.3916 

0.0641 

15 

-0.003 

0.0019 


where {/i(/c)} has been normalized into 

Y^ k i k ) = 1 - (VII.15) 

k 

One important troublesome problem always happened in the twosided filtration 
(non-causal) for data analysis, that is，such filtering could not offer the last segment 
output as well as the first segment output. The output of a non-causal filter may 
be represented as 


N 

y[to) = ^ h k x(t 0 + k), h k = h_k. 

k=-N 


(VII.16) 
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So, suppose that the final index of the data is M, then the last output of the filter 
is 

N 

y{M - N)= hkx{M - N + k), (VII.17) 

k=-N 

and the last segment y(A/- AT+ 1),• • • ,y(Af) as well as the first segment y(l),y(2), 
… , y[N) could not be obtained from (VII.17) (see Fig. VII.5) 


M-N 


M-2N M 2N+1.M N 




■— t 

1 1 1 

-i 
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y(A/ 
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Fig. VII.5 {y(M - AT + 1 ), … ,y(M)} could 
obtained from a non-causal filter 

not be 



Accordingly, we may design a one-sided filter with real coefficients and its fre^- 
quency response function 


N 

Ho[X) = —n < A < 7r, (VII.18) 

l=o 


is required to be an optimum approximate function to OCCF H*(X) by LME cri¬ 
terion. For this purpose, we may put 




2 丌 /c 


2AT+ r 

Q{a)=f2 

/ c =0 / =0 


(VII.19) 


and {cfc} to satisfy 


Q(c) = Min{Q(a)}. 
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These optimum coefficients {c^} can be obtained from 

dQ{cc) 


da 

or equivalently from the following equations: 


0, 


(£ 


cos (/ — 5) 


(Cq ， Ci ， • • • ， Cn 、 


0<l,a<N 




Vk cos ( 占入灸 ) 


where 

Put 


Vk = 好 ♦( 入 0 . 

N 

R[l y s) =R(s,l) = cos 入 fc(/ — s )， /， s = 0,1， • •. ， TV 


k = Q 


la cos ( s 、)， 5 = 0 ,1,... , AT, 

k=0 

then we may have the coefficients {c*：} by solving the equation 



f c 0 \ 


/ 7o \ 


Cl 

= 

7i 


V Cyv / 


IN ) 


where jR(/ ， s) in (VII.23) may be simplified as 

f cos(ATA(/, 5 )) — cos((N + 1)A(/, s)) 


R(l,s) 


with 


A(/,s) 


27T (/ — s) 


cos A(/, 5 ) 

/, 5 = 0 , 1 ,... 、 N. 


(VII.20) 

(VII.21) 

(VII.22) 


(VII.23) 

(VII.24) 

(VII.25) 

(VII.26) 


2N 七 \、 

Therefore, using — H*[\k) which have been listed in Table VII.3 then we have 
the following results in Table VII.4 by (VII.24). 

Using the one-sided filtering {ck,k = 0,1,... , 15} (which has been normalized 
cfc = 1) to the last segment of the observation data we may have 

N 

y (k) = Y^c,x(k-l), k = M - N + 1,... ,M, (VII.27) 

1=0 

which are missed in the two-sided filtration. 

In order to obtain a more smooth curve for the whole trend, we refiltered the 
output of both filters mentioned above 

{y[l + N),y(2 + N),... ,y(M - N)}; 

{y(M - N + l),y[M - N + 2),... ,y(M)} (VII.28) 

by the one-sided filter (VII.27) again. The result is illustrated in Fig. VII.6. 
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Table VII.4 


k 

Ck 

k 

Ck 

0 

1.40 

8 

0.8088 

1 

1.3961 

9 

0.7196 

2 

1.3471 

10 

0.6428 

3 

1.2842 

11 

0.5780 

4 

1.2040 

12 

0.5289 

5 

1.1100 

13 

0.4913 

6 

1.0009 

14 

0.4665 

7 

0.9061 

15 

0.4528 



Fig. VII. 6 The observed data of freight transportation 
in Z-station in China and its trend component 


(2). Harmonic components detection. 

In Chapter 3, we have introduced the hidden periodicity analysis of weak-P model 

p 

y(0 = 二 Ak cos(Xkt -f Ok) + (⑴， (VII.29) 

k=l 


where P and {Ak, = 1,2,... , P) are unknown parameters, and ^(t) is a sta¬ 

tionary with fourth order moment satisfying the weak-P conditions (see (3.129)). As 
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we have stated before, since weak - 尸 series covers many kinds of random sequences 
in practice so we may applied HSY algorithm for our purpose. 

Put y(i) = x(t) — T(t), t = TV 十 1,. • • , M and consider it as the model of (VII.29). 
To detect the hidden periodicities by the HSY algorithm as we have introduced in 
Theorem 3.6, Chapter 3, we have the following estimated parameters 

P = 3 ， Ai =0.49 (period T\ = 12.8 months) 

A 2 =1.03 (period = 6.1 months) 

A 3 =1.57 (period T 3 = 4 months) (VII.30) 

In Theorem 3.8 we know that the power of the convergent speed of the frequency 
estimates {A^} is of the order /3 > 1, but only 1/16 < /? for the amplitudes estimates 
{i4fc}. Hence, we improve the estimates of amplitudes and phases by minimizing 
the following quadratic function: 

{ Min } I Y2 (y( fc )_g4fCos(A:A, + 0,)) (VII.31) 

and the result is 

A i =1750, 0i = — 3.1 ， 

A 2 =1050, e 2 =- 1.5, 

A 3 =950, e 3 =1.2 (VII.32) 

corresponding to frequencies of (VII.30) respectively. 

Hence, we have the harmonic component model of the observation as 

S(t) =1750cos(0.49(« - i 0 ) - 3.1) + 1050cos(1.03(t - t 0 ) - 1.5) 

+ 950cos(1.57(^ -^o) + 1.2) + $(0 (VII.33) 

where to is some starting point. 


(3). Model fitting for the residual. 

Put 

Z{t) = x(t) - T(t) - 3(0, (VII.34) 

where t = iV + 1, + 2,... , M, then under the criterion of maximum entropy 

(M.E.), we may have a model fitting for Z(t), the procedure may be found in 
(2.153)-(2.156). 
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Under the order selection by BIC as well as by HIC with the algorithm of Marple 
(1980), the residual Z(t) in (VII.36) is fitted as AR(9) 

Z{t) -f- 0.0981 邱 —1) + 0.0560Z(t - 2) + 0.06Z(t - 3) 

- 0.2054Z(t - 4) - 0.1949Z(t - 5) - 0.20SSZ(t - 6) 

一 0.1388Z(t - 7) + 0.221lZ(t 一 8) 十 0.1178Z(i - 9) = 662.7e ⑴ 

(VII.35) 


where e(t) is the standard white noise. 


3. Forecasting for freight transportation of practical data 


In the preceding discussion, we have obtained the models of S{t) and Z(t), but 
for the trend T(t) we have only output data {T(N + 1),.. • , T(M)} and no analytic 
representation available. 

For the purpose of making forward predictions, it is necessary to give a function 
statement for T(t). Considering the case of T(t) a slow variation function, a third 
order polynomial approximation by spline function may be good enough. 

Now, start from the index TV + 1 as the first point, and select some points 
{(i tJ T(i t )} as the coincidence points for the spline fitting, then we have the fol¬ 
lowing results (see Table VII.5). 


Table VII.5 


t 


a(t) 

6 ⑷ 

c(0 

d(i) 

1 

(1,13214) 

13214 

-26.407 

0.009106 

-8.41E-4 

2 

(11, 12950) 

12950 

96.2395 

-0.01612 

5.1975E-4 

3 

(33, 15065) 

15065 

53.1244 

0.01818 

-2.220E-3 

4 

(40, 15437) 

15436 

41.4196 

一 0.02846 

4.907E-3 

5 

(45,15644) 

15644 

15.2357 

0.04515 

-4.8E-3 

6 

(51,15736) 

15736 

72.5843 

-0.04152 

1.977E-3 

7 

(58,16243) 

16243 

0 

0 

0 


The mathematical formulation of T(t) by spline function is 

T[t) = d{i)[t-ti) 3 + c(«)(t-ti) 2 + b(i)(t-U) + o(«), (VII.36) 

where t{ < t < i»+i, i = 1,2,... ,7. The fitted T[t) vs. T[t) is illustrated in Fig. 
VII.7 
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I_j_I_I_[_I_1_ 

0 50 60 70 80 90 100 

Fig. VII.7 Spline fitting T(t) for the trend T(t) 

Now, the n-step forward prediction formula for freight transportation based on 
the data {x(s),s < M} is 

x M [n) = f(M + n) 十 S{M + n) + Z M (n) (VII.37) 

where Z\i{n) is the n-step forward prediction of AR(P) model (see Theorem 3.1 or 
(3.25)) 

p 

芝 M(n)=- 二彡 /t 乏左)， n = 1,2,... , (VII.38) 

fc=i 

where 

Zm(u — k) = Z(M + n — A:), if n — /c < 0, 

Z(t) = x(t) - T(t) - S ⑴， t = N + + ,M. 

(VII.39) 

Now, P = 9, {01, 多2 ，... , <t> 9 } have been shown in (VII.35). Accordingly, we have 
the following forecasting results for the freight transportation (see Table VII.6). 

This result not only shows the prediction is more accurate than others (see Table 
VII. 1 and VII.2) but more important thing is our method can predict the extreme 
values (n = 4, 5) much better than former methods. 

Now, we summarize the model fitting and forecasting procedure given in this case 
study: 

Step 1. Filter the original data (x(l), x(2), … ， x(M)} by a filter with two-sided 
coefficients = 0, 士 1 ，士 2 ， ... ，土 15} as listed in Table VII.3 and obtain the 

output {y(16) ， y(17)，..• ,y(M — 15)}. 
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Table VII.6 


n Real Pred. 


15537 

14924 

15992 

13594 

16945 

16435 

19391 

19395 

20182 

19953 

16861 

18284 

15894 

16882 

16874 

17343 

16103 

16850 

16227 

15926 


AEP=4.68% 


Step 2. Use a one-sided niter 

15 

y(k) = ^ cix(k - /), /c = M — 14,M — 15,... , (VII.40) 

l=o 

to the observation {x[M — 29), x[M — 28),... , x(M)} for obtaining a complete 
output 

y = {2/(16)，...，y(M- 15); y(M - 14)， … 
where the coefficients of (VII.40) are listed in Table VII.4 and normalized to be 

18 

[c;=l (VII.41) 

/ =0 

Step 3. Filter the data Y with the one-sided filter (VII.40) again for obtaining 
a smooth trend estimate T(t), t = 16,17,... , M. 

Step 4. Fit a spline function T(t) for the trend output data T(t) with some 
designated points (i,T(t)), i = 1,2,... , J. The program can be found in A.3 
Appendix VII of this case. 

Step 5. Put y(t) = x(t) — T(t), t = 16,17,… ，M, and consider it as a weak- 
P model (VII.29), then use some hidden periodicity analysis procedure, e.g. HSY 
algorithm, for obtaining the order P and parameters {(Xi, A t , t = 1,2，... ,P}. 
Then we have the harmonic component of x(t) as 

p 

s(t) = YL Ak ^os(X k (t~to)+e k ), (VII.42) 

k= 1 


%%%%%%%%%% 
9532142868 
3 . 1 . 01 . 8 . 6 . 2 . 4.1 . 
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where to is some appropriate starting point, since S(t) is a periodic function. 

Step 6. Let Z(t) = y(f) — S(i), t = 16,17, ... , M, and fit a stochastic model 
by M.E. with Marple algorithm by BIC (or HIC) criterion. Then we have AR(P) 
equation as 

P 

-k) = 9 0 e{t). (VII.43) 

k~0 


Step 7. The final forecast formula is 

xm(ti) =T(M 十 n) + S(M + n) + 

p P 

=spl (M + n) -f Ak cos(Ajt(M -f n - i 0 ) + ^ <f>kZM(n - k) 

k=\ k=i (VII.44) 

and Z\i(n — k) = Z (M -f- n — /c), if n — A: < 0, n = 1,2,... , mq. 

4. Discussion 

By X-ll, we may see that its result in our case as shown in Fig. VII.2, Fig. 
VII.3 and Fig. VII.4 has some demerits. First, we may see that there are some 
comparatively high frequencies involved in the trend component T(t) in Fig. VII.2. 
That is not a surprise for, by spectral analysis point of view, the frequency response 
function Ft(X) for subtracting the trend component can be represented as 

Fr(A) = F 3 (X)(l- F 2 (X)(l- F^X))). (VII.45) 

(See A.l Appendix VII), its figure is shown in Fig. VII.8. 

Evidently, such filter has a very large side-lobe so that the “high frequencies” cj i = 
14 告 to UJ 2 = 18^, approximately 7 to 10 months in period, are partly included in 
the output of the trend component. 

Secondly, we have shown in Table VII. 1 that the prediction of X-ll does not 
offer good results for extreme values (see Fig. VII.9), /c = 4,5. This fact is neither 
quite strange, because in comparing the seasonal component S(t) by X-ll with 
the harmonic component S(f) (VII.35) by HSY method and with original data y(t) 
=x(t) — T(i), we may find that the peaks of y(t) are much better adjusted by 5(t) 
rather than that by Su(t) (see Fig. VII.9). 

An interesting thought foggy problem is in analyzing the frequency response 
function of the filter 


F S (X) = F 4 (A)(1- F 3 (A)(1- F 2 (A)(1 - F x (A))) 



1U 


cLU 


dU 


x/r/64 


Fig. VII.8 Frequency response function of the filter in 
X-11 for decomposing the trend Tn(A). 



I Fig. VII.9 Seasonal component Sn(t) } harmonic com¬ 
ponent S ⑴ and original data y(t). 

which is illustrated in Fig. VII. 10. 

We find that the main periods filtered by Fu(\) are T\ = 12 (months); T 2 = 6 
(months); Tz = 4 (months) which are quite similar to that of (VII.31), but the 
result of S(t) is much better than that of Su ⑷， perhaps one of the reasons is: the 
estimates of phases {0^} are Involved in S(t). 
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x/r/64 

Fig. VII.10 Frequency response function ^(A) of the 
filter for decomposing the seasonal component Sn(t). 


Finally, from Fig. VII.4, we may see that there are statistical correlation in the 
irregular component / ⑴， i.e. it is not a white noise series. This fact has been 
pointed out by Clevelen Sz Tiao (1976). The correlation coefficients of the residuals 
are as in the following: 


k 


k 

⑷ 

1 

-0.2087 

9 

0.0139 

2 

•0.1676 

10 

0.0838 

3 

-0.1100 

11 

0.2339 

4 

0.0562 

12 

-0.2530 

5 

-0.007 

13 

0.1747 

6 

0.0301 

14 

0.0275 

7 

•0.1433 

15 

0.0302 

8 

-0.1355 




we may see that it is not a white noise. Indeed, by M.E. model fitting with Marple 
algorithm and BIC order selection, we have an AR(12) model, whose spectrum 
involves many high frequency components. 

In our study, we have considered this fact and an AR(P) model is fitted for the 
residual Z[t) series. A better result is obtained in our forecasting. 

Recently, Chen & Huzii (1992) exhibit some good properties for the simple expo¬ 
nential smoothing predictor, which may be found in the paper of Makridakis et al. 
(1984). We also use the exponential smoothing predictor for the freight transporta¬ 
tion data (readers may find some basic results on this procedure in A.2, Appendix 
VII). 


x t +i = (1 一 Of)xe H- axt 


0 < |a| < 1 
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the optimum parameter a by minimizing 
M — 1 M —* 1 

c n (a) 2 = (x(n + 1) — x(n + l)) 2 (VII.46) 

n=0 n=0 

is a = 0.86. 

Therefore, we have the following prediction formula 

n— 1 

x M (n) =0.U n x{M) + 0.86 D 0.14*x(M) 

8=0 

M-l 

x[M) =0.86 ^2 0.14 a x(M — 1 一 s), (VII.47) 

a=0 

where n = 1,2,... , 10. The result is listed in Table VII.7. Its APE=4.4%<4.68% 
the APE by our method, but unfortunately, such predictor could not offer a good 
forecasting for extreme values (k = 4,5). 


Table VII.7 


n 

Real 

Pred. 

EP 

1 

15537 

16251 

4.6% 

2 

15992 

16099 

0.7% 

3 

16945 

16078 

5.1% 

4 

19391 

16076 

17.1% 

5 

20182 

16075 

20.3% 

6 

16861 

16075 

4.7% 

7 

15894 

16075 

1.1% 

8 

16874 

16075 

4.7% 

9 

16103 

16075 

0.2% 

10 

16227 

16075 

0.9% 


Several methods for making 10-step forward predictions are illustrated in Fig. 
VII.ll. 



01234567Q9 10 

Fig. VII.11 Comparison of several forecasting methods. 
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Appendix VII 


A.l On the X-ll processing procedure. 

Suppose that = 1,2,... , Af} are observations, then there are three com¬ 

ponents considered in the X-ll procedure 


x (0 = Tii(0 + Sii ⑴ + ill ⑷ 


where T\\(t) is the trend component, Sn(t) the seasonal component and In(t) the 
irregular component. For decomposing each component, the following procedure is 
carried out with a series of filtrations: 

Step 1. Filter the i(i) series by F\ filter for obtaining a preliminary estimate 
Tn(t) for the trend ⑷. The filtering coefficients of F\ in time domain are 


ho z 

1 


~12 


Zl6 : 

=h-Q = 

1 


' 24 

hk : 

=h_k = 

1 

= 12 1 


k = 1, 2, 3,4,5. 


(VII.48) 


Step 2. Put y(t) = x(i) — T u (t), and filter by filter F 2 for getting a preliminary 
seasonal component 5" ⑴. The filtering coefficients of F 2 are: 

ho =0.333 
hi2 =/i _12 = 0.222 
fl24 =h-24 = 0.111 

h k =h_ k = 0 、 when k ^ 0,12,-12,24, -24. (VII.49) 


Step 3. Let Z(t) = x(t) — 5n(i), and input to the filter F 3 for obtaining the 



trend estimate Ti i (t). The coefficients of filter F 3 are 
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/lo 

=0:24 


"1 

=/i_ 1 = 

= 0.214 

Zl2 

~h-2 - 

= 0.147 

"3 

=h-3 = 

= 0.066 

/l 4 

= /l_4 = 

= 0.0 

/l5 

=/l_5 = 

=-0.028 

flQ 

= /l-6 = 

=-0.019 


(VII.50) 


Step 4. The seasonal component ⑷ is 


■^11 (0 =^4(^(0 一 ^* 11 (()) 


where 


Tn(t) =F 3 (x(t)-F 2 (x(t)-f ll (t))); 

fu ⑴ =^(x(t)) (VII.51) 

and the coefficients of F 4 in time domain are: 


ho = 0 . 2 ; 


/i 12 : 

=h. 

-12 = 

= 0.2 ； 

/l24 : 

= /l_ 

-24 = 

= 0.13; 

^36 : 

—h- 

-36 = 

= 0.07; 


k 3 =0 ， when 5 ^ 0,12, 一 12,24, -24,36, -36. (VII.52) 

Step 5. The residual is defined as / n (i) = x(t) — Tn(t) — 5n(i). 


Evidently, since we know the filtering of steps 1-5 all are linear operations, the 
filters can be considered as linear systems. Therefore, we may have an overall 
synthesis system as shown in Fig. VII. 12. 

The frequency response function for ⑷ is 


and 


Ft(\) =F 3 (A)(1-F 2 (A)(1-F 1 (A))) 


(VII.53) 
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Fig. VII.12 An overall synthesis system for X-ll 


F S (X) =F 4 ( 入 )(1 一 F t (X)) (VII.54) 

is the frequency response function for S\ \ (i), they both have been illustrated in Fig. 
VII.8 and VII.9. 

The figures of overall filtering coefficients {/i-r(^)} for the trend Tn(t) and 
{hs(k)} for the seasonal component 5u(t) are illustrated in Fig. VII.13, VII. 14 
and their values are exhibited in the following page. 

Accordingly, we may obtain directly Tn(t) and Sn(t) by {/i t (A;)} and {hs(k)}. 

36 

T n(0 = h r(k)x(t + k)\ 

% k=-36 
36 

*^n(0 = + A:), t = 37,38,... ,M — 36. 

k= — 36 (VII.55) 

In X-ll procedure, as we have shown before, the last segment of output with 
indices (M — 36 ) 十 1， (M — 36) + 2,…， M could not be obtained, and it is apparent 
that these data are quite useful for constructing a forecasting model. Hence, the 
users have to make supplement by some extrapolation methods. Some methods are 
available for such extending. 

(1). Simple extending. 

Put 

x(M - 36 + /fc) = i(M- 36 )， (VII.56) 

where fc = 1 ， 2, . •. ， 36. 







coefficients of {^r(^)} 


k 

h T (k) 

k 


0 

1.8789E-1 

20 

1.0065E-2 

1 

1.7068E-1 

21 

1.7580EJ-3 

2 

1.2612E-1 

22 

-7.5370E-3 

3 

7.1790 E-2 

23 

-1.4845B-2 

4 

2.6787E-2 

24 

-1.7427E-2 

5 

6.4478E-3 

25 

-1.4237E-2 

6 

1.4577E-2 

26 

-6.6710E-3 

7 

2.7157E-2 

27 

2.0160E-3 

8 

1.9278E-2 

28 

8.3608G-3 

9 

3.6450E-3 

29 

9.8058E-3 

10 

-1.4641E-2 

30 

6.7155E-3 

11 

-2.9386E-2 

31 

2.5151E-3 

12 

-3.4854E-2 

32 

8.5222E-4 

13 

-2.8778E-2 

33 

-1.2897E-4 

14 

-1.3775E-2 

34 

-4.3299E-4 

15 

3.9030E-3 

35 

-3.0406E-4 

16 

1.7574E-2 

36 

-8.7545E-5 

17 

2.2127E-2 

37 

6.5702E-9 

18 

2.0146E-2 

38 

-6.0254E-9 

19 

1.4836E-2 

39 

-2.2774E-8 


coefficients 

of{h S (k)} 

k 

^s(k) 

k 

hs(k) 

0 

1.8091E-1 

20 

-1.0749E-2 

1 

-1.8704E-2 

21 

-1.0663E-2 

2 

-1.7661E-2 

22 

-1.0907E-2 

3 

-1.6352E-2 

23 

-1.1270&2 

4 

-1.5180E-2 

24 

1.1849E-1 

5 

-1.4528E-2 

25 

-1.1531E-2 

6 

-1.3820E-2 

26 

-1.1279E-2 

7 

-1.3581E-2 

27 

-1.0774&2 

8 

-1.4204E-2 

28 

-1.0016E-2 

9 

-1.6073E-2 

29 

-9.1282E-3 

10 

• 1.8272E-2 

30 

-8.2852E-3 

11 

-2.0515E-2 

31 

-7.4546E-3 

12 

1.7864E-1 

32 

-6.5620E-3 

13 

-2.0676E-2 

33 

.-5.8277E-3 

14 

-1.8702E-2 

34 

-5.3284E-3 

15 

-1.6141E-2 

35 

-5.1005E-3 

16 

-1.3753E-2 

36 

• 6.4884E-2 

17 

-1.2248E-2 

37 

-5.3620E-3 

18 

-1.1591E-2 

38 

-5.7003E-3 

19 

-1.1291E-2 

39 

•5.9389E^3 



Fig. VII.14 Overall filtering coefficients for the seasonal 
component. 

(2). Simple exponential smoothing predictor. 

Let M n = M — 36, then 

k-l 

x[M n + /c) = (1 — a) k x(M n ) 4 - a ^(1 - a) a x(M n ) l 

a=0 


where fc = 1,2,... , 36, and 






a is determined by minimizing the error 
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M-l 

Q(a) = (x(n + 1) — £(n + l)) 2 = Min . (VII.59) 

) a 

n=0 


(3). A RIM A modelling. 

Cleveland and Tiao (1976) study the X-ll procedure and found that it can be 
approximately fitted as a seasonal ARIMA model in overall form of X-ll: 


(1 一 B)(l - B l2 )y{t) = (1-0.3375 + 0.144B 2 + 0.141B 3 + 0.139B 4 

+0.136B 5 + 0.131 丑 6 + 0.125J5 7 + 0.117 丑 8 
+0.106J5 9 + 0.093B 10 + 0.077B 11 - 0.417B 12 
+0.232B 13 — 0.001B 20 - 0.003B 21 - 0.004 丑 22 
—0.006B 23 + 0.0355 24 - O.O21J9 25 )c(0, 

(VII.60) 

where c ⑷ is a white noise, normally distributed with zero mean. 

Then by the extrapolation method of ARIMA model (see Box and Jenkins (1970)) 
we may use the extended value x(M n + k) ， k = 1,2,... , 36. 

(4). Maximum Entropy modelling. 

Take difference for data x(t) with order 1 


z(t) = x[t + 1) — z ⑴ 

for obtaining a comparatively stationary sequence, then fit an AR(P) model by 
M.E. method with appropriate algorithm and order selection. The extrapolation 
procedure for AR(P) is easy for calculation and has been presented before. 


A.2 Simple exponential smoothing predictor. 

Makridakis et al. (1982) introduced many kinds of prediction methods, one of 
them is the simple exponential smoothing predictor. 

Let x(0), i(l),... , x(M) be observation data, the forecasting procedure of such 
predictor is as in the following: 

(l). For constructing the forecasting formula, put 

x(t + 1) = (1 — a)i(i) -f ctx(t), (VII.61) 
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where 0 < |a| < 1 , t = 1,2,... , M — 1, and r(l) = ax(0), and find the optimum 
a*, —1 < a* < 1, which minimizes the error 

M-l 

<?(«) = Z) ( x ( n + 1) - 2(n + l)) 2 

n—0 

(2). The A:th step forward forecasting is 

x(M + A;) = ax{M) + (1 — a)x{M + A: — 1 )， (VII.62) 

where k = 1,2,_, N. 

It is not difficult to give overall representations of (VII.6l)-(VII.62) in clearer 
forms. Put 

t =0 : x(l) =ax(0), 

n 

t =n : x(n -|- l) =a ^^(1 — a) 8 x(n — 5 ) (VII.63) 

a=0 

and find the optimum a* for minimizing the square error Q(a). Then the extrapo¬ 
lation formula is 

k-i 

x{M + fc) = (1 — a) fc x(M) + a E(1 - a)*x(M). (VII.64) 

3=0 


A.3 Program for fitting a spline function. 

The following program is used for the fitting of data by spline polynomials with 
order 3. N is the total number of points to be required for the coincidence of spline 
function. i(l), x(2),... , x(N) are x-coordinates; y(l),y(2),... , y[N) are data val¬ 
ues corresponding to previous x-coordinates (they are listed in the program number 
25 and 30 respectively). D y (k) = (乃一 2/(0) 2 may be selected as 0.1 〜 0.01. 

>1(:.), B{i ), C(i), D(i) are constants of the first order, the second order and the third 
order coefficients of the fitted polynomials in the segment of x(i) < x < x(i + 1). 

Func(A:), A; = 1 to x(N) are the fitted values of the spline functions. 

5 PRINT ” input N=? M :INPUT N 

10 Nl=l:N2=N:S=N 

15 DIM X(N), Y(N), DY(N), A(N), B(N), C(N), D(N) 

20 DIM R(N+1), R1(N+1) ， R2(N+l), T(N+1), Tl(N-fl), U(N+1), V(N+1) 



25 DATA 1 ， 11 ， 33, 40, 45, 51， 58 

30 DATA 13214, 12948, 15065, 15437, 15644, 15736, 16243 

35 FOR K=1 TO N 

40 READ X(K):DY(K) = .ll 

45 NEXT K 

50 FOR K=1 TO N 

55 READ Y(K) 

60 NEXT K 

65 M1=N1-1:M2=N2+1 

70 R(M1)=0:R(N1)=0:R1(N2)=0:R2(N2)=0 

75 R2(M2)=0:U(M1)=0:U(N1)=0:U(N2)=0:U(M2)=0 

80 P=0:M1=N1+1:M2=N2—1 

85 H=X(M1)-X(N1):F=(Y(M1)-Y(N1))/H 

90 FOR I=M1 TO M2 

95 G=H:H=X(I+1)-X ⑴ 

100 E=F:F=(Y(I+1)-Y(I))/H 

105 A(I)=F-E:T(I)=2*(G+H)/3 

110 T1(I)=H/3:R2(I)=DY(I-1)/G 

115 R(I)=DY(I+1)/H 

120 R1(I)=-DY(I)/G-DY(I)/H 

125 NEXT I 

130 FOR I=M1 TO M2 

135 B(I)=R(I)’2+R1(I) A 2+R2(I) A 2 

140 C(I)=R(I)*Rl(I-fl)+Rl(I)*R2(I+l) 

145 D(I)=R(I)*R2(I+2) 

150 NEXT I 
155 F2=-S 

160 FOR I=M1 TO M2 
165 R1(I-1)=F*R(I-1) 

170 R2(I-2)=G*R(I-2) 

175 R(I)=l/(P*B(I)+T(I)-F*Rl(I-l)-G*R2(I-2)) 

180 U(I)=A(I)-Rl(I-l)*U(I-l)-R2(I-2)*U(I-2) 

185 F=P*C(I)+T1(I)-H*R1(I-1) 

190 G=H:H=D(I)*P 

195 NEXT I 

200 FOR I=M2 TO Ml 

205 U(I)=R(I)*U(I) — Rl(I)*U(I+l)_R2(I)*U(I+2) 

210 NEXT I 
215 E=0:H=0 
220 FOR I=N1 TO M2 
225 G=H 

230 H=(U(I+1)-U(I))/(X(I+1)-X(I)) 

235 U(I) = (H-G)*DY(I) A 2 
240 E=E+V(I)*(H-G) 
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245 NEXT I 
250 G=~H*DY(N2) A 2 
255 V(N2)=G:E=E-G*H 
260 G=F2:F2=E*P"2 

265 IF (S<=F2)OR(F2<=G) THEN GOTO 320 
270 F=0:H=(V(M1)-V(N1))/(X(M1)-X(N1)) 

275 FOR I=M1 TO M2 

280 G=H:H=(V(I+1)-V(I))/(X(I+1)-X(I)) 

285 G=H-G-Rl(I-l)*R(I-l)-R2(I-2)*R(I-2) 

290 F=F+G*R(I)*G:R(I)=G 
295 NEXT I 
300 H=E—P*H 

305 IF H<=0 THEN GOTO 320 
310 P=P+(S-F2)/((SQR(S/E)+P)*H) 

315 GOTO 160 

320 FOR I=N1 TO N2 

325 A(I)=Y(I)-P*V(I) ： C(I)=U(I) 

330 NEXT I 

335 FOR I=N1 TO M2 

340 H=X(I+1) - X(I) 

345 D ⑴ =(C(I+1)-C(I))/(3*H) 

350 B(I)=(A(I-fl)-A(I))/H-(H*D(I)+C(I))*H 

355 NEXT I 

360 FOR I=N1 TO N2 

365 LPRINT ”I=”;I，:LPRINT ” A (”; I;”)=” ; A(I),”B(” ; I ; ” ） =”;B(I )， 

370 LPRINT ” C (”; I;”)=” ; C(I) ， ”D(” ; I ; ”)=”;D(I) 

375 NEXT I:LPRINT” ” 

380 IX=INT(X(N)) 

385 DIM FUNC(IX),I(N) 

390 FOR K=1 TO N-l 

395 I(K)=INT(X(K)):I(K+1)=INT(X(K+1)) 

400 FOR I=I(K) TO I(K-hl) 

405 FUNC(I)=((D(K)*(I-X(K)) + C(K))*(I-X(K))+B(K)) *(I 一 X(K))+A(K) 
410 NEXT I:NEXT K 
415 FOR K=1 TO IX 

420 LPRINT ”FUNC(” ; K ; ”)=”;FUNC(K);:NEXTK 



235 


CASE VIII 

The Water Flow Prediction in Xiang River* 


1. Introduction 

The forecast of water flow in some river in China is no doubt a very important 
task, especially in the raining season of summer days. When the river overflows the 
banks, it could become a disaster to the people living along the shore areas, for the 
turbulent torrent may destroy everything. 

One of the institutes in south China studied the forecast problem of water flow 
in the Xiang river, but the researchers there did not succeed. They approached 
us and we suggested the method we used in Case VII of this book, but the result 
was not satisfactory to them, for they were particularly interested in having a good 
prediction of very large values of the flood water. 

Figure VIII.1 is the water flow record from 1958-1981 in the Xiang river in south 
China: 

Someone suggested the Adaptive-Response-Rate of Single-Exponential Smooth¬ 
ing (ARRSES, see Makridakis Wheelwright (1983)) procedure which is a good 
algorithm for forecasting, provided that no trend component is involved in the 
record. 

The basic idea of ARRSES is based on the SES which has been introduced in 
Case VII (see A.2 in Appendix VII) but with adaptive gain a t at time t, i.e. the 
constant a in SES now will change automatically in ARRSES when some factors 
change in the observation data. 

The basic formula for forecasting in ARRSES is 

+ (1 — ott)x ty (VIII.l) 

where 

« t+1 = | 彔 I, (VIII.2) 

♦The results in this case study is mainly the third part of master thesis of Mr. Li Dongfeng and 
has been presented to the 4th China-Japan Symposium on Statistics in 1991. The subject was 
suggested by Prof. Xie Shihao, Changsha Institute of Railway. 
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and E“ Mt satisfy the following equation respectively 

E t =Pe t + (1 - P)E t -i 

M t =p\e t \ + (l-0)M t ^ 

et —x* — Xt 


P = 0.2 is suggested in Makridakis et al. (1982). 

Suppose that x(l), i(2),... , x(M) are observations, then we may start from some 
initial values to obtain ex, E\ ) M x for calculating a 2 , then e 2 , M 2 , for 

x 3 and so forth. Evidently, we may obtain the k-step forward predicting formula as 

^M+k = + (1 — 元 Vf+fc— 1 ， A: = 1,2 ,... ,K. (VIII.6) 

We use the ARRSES algorithm for prediction of the water flow in the Xiang 
river, and list the result in Table VIII.1, and illustrate the figure in Fig. VIII.2. 

We may see that not only the average error percentage is a large value but the 
pity is that it does not offer good forecast on those high values. 

2. Constructing a prediction formula based on the hidden periodicities 
by the quantile method 

From Fig. VIII.1 we may notice two features which are different from the figure 
of freight transportation (see Fig. VII.1). 


. 3 )- 4).5 
III 

I I T1 

(v v V 
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Table VIII. 1 


k 

Real 

Pred. 

EP(%) 

1 

513 

441 

13.8 

2 

335 

366 

9.2 

3 

488 

380 

22.1 

4 

1250 

549 

56.0 

5 

1840 

890 

51.6 

6 

2150 

1839 

14.5 

7 

4520 

2206 

51.2 

8 

5060 

1885 

62.7 

9 

1900 

1190 

37.4 

10 

1690 

768 

54.6 

11 

1750 

483 

72.4 

12 

704 

435 

38.3 


AEP=40.3% 



Fig. VIII.2 Comparison of prediction values by ARR- 
SES and the real record. 

(1) . In the record of freight transportation, a trend is easily seen in the figure. 
However, in Fig. VIII. 1 we could not find any apparent sign of a trend. 

(2) . In Fig. VIII. 1 we may see that there are many strong peak values which are 
quite unsymmetric with respect to their mean values. 

The first feature shows that the main problem in flow forecast is not mainly to 
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deal with “how to decompose and to construct a mathematical formula for the 
trend” as we have done in Case VII; the second feature shows that it is not easy 
to get a direct model fitting to such record, since some functions which are usually 
used in model fitting, say sin(x), cos(x), are not easy to construct into such a form 
with only a few terms. A logarithmic transform is often very useful to allay the 
unsymmetricity of the form. For example, we have the following artificial series and 
its ln(x(i)) transformation 

x[t) : 7 55 403 55 7 55 7 55 ... ( . 

ln(x(0) : 1.94 4.00 6.00 4.00 1.94 4.00 1.94 4.00 ... 1 UL7 ) 

The figures of these series are illustrated in Fig. VIII.3. 



Fig. VIII.3 An artificial series and its log-transform. 


We note that the original data x(i) show unsymmetric form with respect to its 
mean value severally, and the log(z ⑴） is much easier in the figure. 

In the general case, the log-transformation can be taken as 

y(t) = log a WO _ 6 ) (VIII.8) 

where a and 6 are constants to be change appropriately. 

After the log-transformation, a good harmonic component adjustment is neces¬ 
sary for y(i), since there are strong seasonalities involved in the original record Fig. 
VIII.l. 
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Now, we want to introduce, in the general case, a more effective algorithm than 
HSY (see Chapter 3) for extracting the harmonic components by an order statistic 
approach which is introduced and called Q-method by Li in 1990. 

Suppose that the observation model is 

P 

y[t) = cos(u k t + (t>k) + e(0. (VIII.9) 

fc= l 

where F, {Ak^ = 1,2,... ,P} are unknown constants, {“} are iid random vari¬ 
ables with (—丌，丌) distribution; f(f) the noise satisfying some mathematical con¬ 
ditions (see A.l in Appendix VIII). 

Put 

1 N 2 

/n(a)= 2Wv ^ y(fc)e "' kA • Aen , (vni.io) 

k= 1 

攻 )={ 〜 ( 异 ) ,_?. = 1 ， 2,... ， N) (VIII.ll) 

and arrange { 攻 )， y = i ， 2，...，#} as order statistics 

o < 4° S 沒 ) 幺 … S U 枚 (VIII.12) 
Then define a random variable 

Aq = 4 M) ^ 1/4 (VIII.13) 

where M = [0.75N]. 

Now, let J be the set 

•/ = 0、 < < … < 九 } = {y : 攻 ）> a 。} (vm.14) 

and define the estimated order 尸 as follow: 

a. Put P = 0 if J is an empty set. 

b. Otherwise, making a partition {Jk} for the J 

J = J\ UJ r 2 U***U«/p (VIII.15) 

such that for Wjki € J/, jks € # s) 


\j k i - > [ a ^ 0 - 55 ] + 2 


(VIII.16) 
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Fig. VIII.4 Making partition for the set J. 


(see Fig. VIII.4). 

Suppose that the maximum value of in the set Jk is at / = j(A:), i.e. 

/^' (k)) = In k = l,2,... ,P (VIII. 17) 

then the hidden frequencies to be estimated are 

A* = A: = 1,2 ,…， P (VIII.18) 

and P is the estimated order of model (VIII.9). 

Theoretically, it has been proved that P and {Ajt} are strong consistent estimates 
for the order P and hidden frequencies {A*}- Moreover, by Monto-Carlo simulation 
which shows that the present method in many cases are better than those that have 
been widely used, e.g. M.E. etc.. 

Now, the forecasting procedure for the water flow of the Xiang river is as follows 
(we will call it as LDSARMA in the sequel): 

(1) . Transforming the original data into logarithms according to the following 
equation 

x(i) = ln(y ⑷一 100) ， t = 1,2,... ,250. (VIII.19) 

(2) . Making a 1st order difference for x[t) 

x(t -f l) = x(t + 1) — 2 ⑴， t = 2, 3,… ， 250. 


(VIII.20) 
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(3) . Estimating the order P and detecting the frequencies {A*} by the Q-method 
introduced above. 

(4) . Based on (3)，minimizing the following sum of square errors 


inf 

.A* 




Ak cos{Xkt + <t>k) 


for obtaining more accurate parameter estimates. 

(5). Putting 


(VIII.21) 


p 

e(t) = x(t) ~Yl, A k cos(A fc t + <t> k ), t = 2,3,... ,250 (VIII.22) 

k = l 

and fitting an ARMA(p, q) model for e(t). 

The forecasting result of LDSARMA is listed in Table VIII.2 and the figure is 
illustrated in Fig. VIII.5. 


Table VIII.2 


k 

Real 

Pred. 

EP(%) 

1 

513 

552 

7.5 

2 

335 

600 

79.3 

3 

488 

615 

26.1 

4 

1250 

836 

33.1 

5 

1840 

1456 

20.9 

6 

2150 

2865 

33.3 

7 

4520 

5541 

22.6 

8 

5060 

4508 

10.9 

9 

1900 

3183 

67.5 

10 

1690 

1729 

2.3 

11 

1750 

1245 

28.8 

12 

704 

1109 

57.9 


AEP=32.5% 


Though the AEP is not very satisfactory, getting for such extreme values 5060 
and 4520, the prediction may be considered as comparatively good. 


3. Comparison and discussion 


Several other forecasting methods have been worked out for the same purpose 
except the ARRSES which has been introduced in Section 1， hence not repeated 
here. All the other methods will be stated briefly in the following: 
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Fig. VIII.5 Water flow forecasting by LDSARMA 

(1) . MSAR 

Put x(i) = y ⑷ /1000 , （ = 1 ， 2，. •• ,250, where y(t) is the observation data. Then 
decompose the frequency components 5(t) by the Q-method which is introduced in 
Section 2. 

Now, let e[t) = x(i) — S(t) and fit an AR(p) model for e(t). The forecasting result 
is not so bad (see Table VIII.3 and Fig. VIII.6), the EP for those extreme values 
(k=7, 8) are the best even though the AEP is greater than LDSARMA. 

(2) . LSAR 

Put a transformation for y ⑷ 

工⑴ =logio(V ⑴一 190 ) 

and detect the hidden frequency components by Q-method, then fit an AR(p) model 
for the residual 

e(t) = x(t) — 5(t). 

The forecasting result in Table VIII.3 shows that the AEP of LSAR is the best, but 
the EP for fc = 7,8 are comparatively high, its figure is illustrated in Fig. VIII.7. 

(3) . LDR 

In this procedure, after making a log-transformation 

x(t) = ln(y ⑴ - 100), t = 1,2,... ,250 

then take a 1st order difference for x(t). Finally, fit the series by linear regression 
model with 12 parameters. The results both in AEP and extreme values are not so 
good (see Table VIII.3 and Fig. VIII.8). 
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3 6 9 ia 


Fig. VIII.6 Water flow forecasting by MSAR 
LSAR 

x 1000 



3 6 9 IS 


Fig. VIII.7 Water flow forecasting by LSAR 

(4). RMA 

The forecasting algorithm of RMA can be found in the book of Makridakis and 
Wheelwright (1983) and will briefly introduced in A.2 Appendix VIII. 

Satisfactory results also could not be obtained both in AEP and for extreme 
values (see Table VIII.3 and Fig. VIII.9). 



Fig. VIII.9 Water flow forecasting by RMA 

A comprehensive table for comparison of all forecasting methods which have been 
used in the present case study is as in the Table VIII.3. 

It is a very important fact that in LDSARMA as well as in MSAR, their forecast¬ 
ing for k = 7,8, i.e. extreme values are comparatively good, since the Q-method for 
harmonic components detecting has been used in both procedures. 
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Table VIII.3 


k 

Real 

ARRSES 

LDSARM A 

MSAR 

LSAR 

LDR 

RMA 

1 

513 

441 

552 

647 

465 

510 

510 

2 

335 

366 

600 

564 

528 

400 

409 

3 

488 

380 

615 

450 

688 

404 

415 

4 

1250 

549 

836 

885 

1032 

590 

588 

5 

1840 

890 

1456 

2115 

1659 

900 

939 

6 

2150 

1839 

2865 

3657 

2504 

1823 

1918 

7 

4520 

2206 

5541 

4608 

3126 

2068 

2280 

8 

5060 

1885 

4508 

4406 

3017 

1833 

1935 

9 

1900 

1190 

3183 

3310 

2280 

952 

1214 

10 

1690 

768 

1729 

2148 

1477 

745 

780 

11 

1750 

483 

1245 

1582 

942 

477 

489 

12 

704 

435 

1109 

1586 

671 

431 

439 

AEP(%) 


40.3 

32.5 

38.9 

25.5 

40.9 

38.4 


Readers may find that the AEP for those methods in Table VIII.3 are much 
higher than that of the AEP in Case VII. May be one of the reason is that in Case 
VII， there exists large values of trend component, and as soon as we can have a good 
fitting for the trend, the error of the prediction will be held low. Conversely, there 
is almost no trend in the present case study, and adjusting the random harmonic 
component as well as making a good forecasting for the irregular residual both are 
not easy for practical data. 

Another approach for improving the accuracy of prediction for the data with the 
form as Fig. VIII.1 may be considered in the following model 

y(t) = T(0 + 5(«)4 - P(T) + (VIII.23) 

where notations T(i), 5(t), are understood before, and F(t) is the so-called 
“pulse-component” and may be represented as 

P(t) = (VIII.24) 

k 

where are unknown constants, {^} are random phase and the A(t) may be 

considered as rectangular function, triangle or Dirac function etc. The basic figure 
of P(t) is as in Fig. VIII.10. 

The reason for introducing such component is: in the general case by Fourier 
analysis, to construct such a pulse series needs many many terms of sinusoid com¬ 
ponents and so is not easy to be satisfied in practical applications. 
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Appendix VIII 


A.l Quantile method for detecting the hidden periodicities 
Suppose that the model of the observation is 

y(n) = S(n) + ((n )， n = 0 ， ±1 ，."， （ VIII.25) 

where 

p 

S( n ) = XI 2Ak cos (Afc n + 心） 

k= 1 

is the harmonic component, P, {Xk 、 小 = 1,2,... ， .… ，尸 } are unknown parame¬ 
ters satisfying the conditions of 

(1) . 0 < 入1 < 入2 < • . • < 入/ 3 < TT . 

(2) . Ajt > 0,知 = 1，2, ... ，尸. 

(3) . <^i, <^> 2 , - - - , 4>p are iid random variables with t/[—7r, n] distribution and ((n) 
is the noise, satisfying the conditions of 

a. Ergodic stationary time series in the strict sense. 

b. ^(n) is a linear process, i.e. it can be represented as 

oo 

f(n) = a(k)e(n - k) (VIII.26) 

k=0 

where ^ 

E < oo, P>\' a(0) = 1 

k=0 

and {e(n)} are iid series, e(n) measurable with respect to the cr-field J n = a{x(m) 1 
m < n} and Ee(n) = 0, E\e(n) | 2 = cr 2 , ^|e(n)| 5 < oo. 

c. The characteristic function Q(0) of e(n) satisfies the condition of 

sup Q{0) = P[0q) < 1, V0 O > 0. 

|^|>tfo>0 

d. The spectral density of ^(n), /^(A) > 0, A G FI. Then put 

w) = ▲ f>( n ) e,An ' Aen 

A:= 1 
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and carrying out the procedure for estimating the order P 、 {A^} as we have shown in 
Section 2 of the present case study (see (VIII.11) to (VIII.16), where some constants 
may be selected in a more general form: 

M ^ < P < 1 ； 

A >0, 0 < a < 1; 

A 0 =AMN Q 1 

and (VIII.16) may be changed into 

\jkl-jk,\ > +2, 0<5 < 1. 

Based on the estimating procedure mentioned above, we have the following the¬ 
orem: 

Theorem. Suppose that {y(i)} is a time series with (VIII.25) as the model and 
(VIII.26), t/(l), y(2),... ， y(7V) are observations of y(n). Let P and {Ajt, k = 1,2, 
..., P} be the estimates of the order and hidden frequencies respectively, then with 
Pr.l, for sufficiently large N 、 we have 

P =P\ 

|*^/c ~ ^k\ < A ： = 1 ， 2, • . • ，戶 . 

Li (1990) has made a systematic Monte-Carlo study on the comparison of 7 kinds 
of algorithms for detecting the hidden periodicities in different signal-to-noise ratio 
situations and with several types of noise, hidden frequencies and different sample 
sizes etc. Algorithms for comparison in his study include the well known Grenander- 
Rosenblatt test, Maximum Entropy, HSY，Pisarenko and others. Li showed many 
very interesting results and defects of some algorithms are also indicated in his 
study. 


A.2 RMA forecasting method 

This forecasting procedure is called Ratio-to-Moving Average method in Makri- 
dakis and Wheelwright (1983). 

The model of the observation is assumed to be 

x(t) = I(t) * T{t) * C(t) * E(t) (VIII.27) 

where I(t) is the seasonal component; T(t) the trend component; C(t) the cyclical 
component; E{t) the error or random component. 
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The first step in RMA is to remove the trend-cyclical component by a moving 
average whose number of terms is equal to the length of seasonality and represented 
as 

M(t) = T(t) * C(t) (VIII.28) 

where M(t) includes mainly the trend and cyclical component, since a large part of 
the seasonality and irregular components can be eliminated by the moving-average. 
It follows from (VIII.27) that 


Jit) * E{t) = (VIII.29) 

M ⑷ 

Then the seasonal component I(t) will be obtained by eliminating the irregu¬ 
lar component E(t) in (VIII.29) with some form of medial averaging of the same 
months. 

The so called medial averaging is to arrange the data by month for all years, the 
medial average is the averaging value for each month after the largest and smallest 
values have been excluded. 

If one wants to obtain the cyclic component C(t) } then a regression function T(t) 
may be fitted to the trend T(t) in (VIII.28) and 


-f ⑴ — f(0 ' 

Finally, we may also obtain the irregular component by 

E(t) = 7 

I{t) * T{t) * C(t) 


(VIII.30) 


(VIII.31) 
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Case IX 


Miscellaneous Cases Study 


In this section, the author wants to introduce some of the other cases study in 
time series analysis which had found very illustrative of applying the theory and 
methods for solving practical problems. 


IX.1 Long Term Weather Forecasting by Seasonal ARIMA Model* 

In the long term weather forecasting, the time series analysis method which has 
been used in China is the stationary AR model fitting and forecasting. However, 
it is found that the predicted value and the amplitude of variations is often under¬ 
estimated as compared with the observed data, since in meteorology the seasonal 
variation is quite strong. Therefore, mathematically, employing the ARIMA model 
introduced by Box-Jenkins (1970) for constructing the long term forecasting model 
is a good trial. 


IX.1.1 Some Relevant Knowledge. 


(l). Seasonal ARIMA model 

A stochastic series x(i) is called an ARIMA(p, (f, g) x (P, D, Q)m model, if it 
satisfies the equation 


(E 《 〆)(f ： V d V^x(t) = 9 k U^(f ： 0 ， )e ( t )， (IX.1.1) 


where {</>“, { 少 /c} ， {h} ， {0/} are coefficients of the model, U the shift operator 


•This case study was investigated by Xiang J., Gu L., Huang W. and Cao H. (1980) and has been 
briefly introduced in An and Xie (1991). 



Ux(t) = x(t — l), and 
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V^=(l-t/ M ) D , 

V d =(1 - U) d , (IX.1.2) 

and M is the seasonal index (for example, A/ = 12 (months), Af = 24 (hours), etc.) 
Put 



A P+MP (U) =(j2 必， ) (H 中 〆 M ) ， 

B q+MQ (U) 

(IX.1.3) 


外 )▽ 二 :(0 ， 

(IX.1.4) 

then z(t) satisfies the ARMA(p + A/P, q -H MQ) model 



^ P +Mp(U)z(t) = B q+ MQ[U)e[t) 

(IX.1.5) 

with sparse coefficients. 


Example 1. x 

is 

⑴ satisfies ARIMA(p, 0, q) x (0, l,0) a model, i.e. the model equation 

(t>{U)V s x{t) = e o 0{U)e(t). 

(ix.1.6) 

Put 

y{t) = S7 s x(t) = (1 一 U 3 )x(t) = x(t) — x(t — s) 

(IX.1.7) 

then we have 

<l>(U)y(t) = 0([/)6 ： (t), (0 O = 1), 

(IX.1.8) 

namely, y(t) is 

an ARMA(p, q) series. 


Example 2. 2 

:⑴ satisfies ARIMA(p, 1,g) x (0,1,0) s model. Now, the model equa- 

tion is 

<f>(U)VV s x(t) = 0(U)E(t). 

(IX.1.9) 

Put 

y ⑴ =vv 3 x(t) = (i-u)(i-u a )x(t) 



=(x(i) — x[t — 5)) — (x(t — 1) — x(t — 1 — s)) 
={z[t) - z(t - 1)), 



where z(t) = x[t) — x(t — s). Then y[t) is an ARMA(p, q) model. 
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(2). M.L.E. and M.S.S.E. under the normal distribution 
Without loss of generality, we may rewrite the ARMA equation as 

P <? 

4>k^(t — k) = € ： (i) -+■ ^ Ok^[t — k), (IX.1.10) 

k—0 k=l 

where Ee 2 (t) = > 0, and denote the parameters as a vector 

A = (^o»^) 

= (^oi »• * * ，彡 p; 汐 1 ， ••- ， 0(j). 


Suppose that e ⑷ is the iid N(0 y a 2 ) random series，x = (xi,... , x^) T is a 
sample, then we have the log-likelihood function as 

logp(x|A) = S log27r + 修 log det(E -1 ) - ■x T E _1 x. (IX.1.11) 

Put 

M n = ^E _1 , (IX.1.12) 


then we have 

det(E _1 ) = det(M")G 2 " (IX.1.13) 


Now, we shall show some important results in the following: 


a. Matrix is free from the parameter Qq. 

We only need to indicate that each element of the matrix is free from 

Si ，In fact, by (IX.1.12), M^ 1 = ^ 2 E, so that ’ 



which shows that each element of is free from the parameter where, in the 

calculation, the stochastic representation of ARMA model is used in (IX.1.14) (see 
Theorem 2.1, Chapter 2). 
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b. I det M^| < C, C is a constant, independent of sample size N when N > p. 

In the following, we shall only present the proof where the model of x(t) is AR(p), 
since the proving is rather cumbersome in the ARMA case. 

In Chapter 2, by (2.89), we have 

detR p+i/+l = ^detR p+I , = (flg) i + 1 det R p (IX. 1.16) 


so that 

detR；^ +1 =5 0 - 2(L+1) detR； 1 . 

Now, put TV = p -f L + 1, L > 0, and by 

we have 

Mp+L+l = ^0^*P+L+1 

so it follows from (IX.1.17) 

detM p+i+l =0^ p+L+l) det(R ； l L+l ) 

=0 i{ P +L + l)^-2(L+l) detR -i) 

=0 o 2p detR； 1 

is independent of the sample size N 、 i.e. 

I det Mf/| < C < oo, N > p. 


(IX.1.17) 

(IX.1.18) 

(IX.1.19) 


(IX.1.20) 

(IX.1.21) 


c. M.S.S.E. 

Based on (IX.1.11), (a) and (b), we may see that for maximizing the log-likelihood 
function (IX.1.11) is equivalent to minimizing the function of 


A(^,/?) = ylog^ + ^X T M w X 


(IX.1.22) 


when N is sufficiently large. 

The optimum estimate by (IX.1.22) is called the Asymptotic Log-likelihood Es¬ 
timate (A.L.E.), and 

5(/3) = X t M w X (IX.1.23) 

is called the function of sum of squares, which is independent of the parameter Oq. 


Now, the optimum estimates of ^q, /? may be obtained from (IX.1.22) by 


dOl 


{a(cw 


N 

20l 


2 吋 


an 


{A(O)} 


2Old0 


S(fl) - 0, 


(IX.1.24) 
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and we call the solution of (IX.1.24) as the M.S.S.E., namely 


where 0 is the solution of 

-^S(P) = 0 . 


(IX.1.25) 


(IX.1.26) 


(3). Powell’s algorithm for seeking the extreme value of a convex function 
For seeking the extreme value of (IX.1.22), Powell’s algorithm is a very powerful 
tool. Powell’s method is an iterative algorithm for solving the optimization prob¬ 
lem of a convex function by linear searching and different from algorithms such as 
steepest descent or others which need to calculate the first and second order of the 
derivatives of the function. 

Now, suppose that X = (x!,x 2 ,... ,x n ) T is a real vector in R^ n \ and /(X) is a 
real positive-definite convex function defined in _R( n ). The Powell’s algorithm starts 
from a point X(*) € R( n ) and finds the minimum value of /(X) along n directions 
化 ’ 仍 ，. .• , r] n , then change the point from X(*) to X(* +1 ) = X(*) + Ary. The detail 
description of the algorithm is as in the following: 

Step 1. Select an initial starting point 

X( 0 ) = ,x° n ) T €R^ 

and redirections 

As an initial step, we may select 
ith 

； y( l ) = (0, … ,1,0,... ， 0 ) r ， i = 1,2,... , n. 

Step 2. For i = 1,2,... ,n, select real values A t , such that for 

x(') = X 卜 *) + A.rjtO 

we have 

fi = /(X( l) ) =inf/(X (l '- 1) + Ar?^), i = 1,2,... ,n. 


(IX.1.27) 


(IX.1.28) 


(IX.1.29) 


where the selection of A should be confined to the area in which the convexity of 
the function /(X) still keeps true. 
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Step 3. Put f 0 = /(X( 0 ))，and for any given S > 0 , if /o — /n 〈石 ， then stop 
the calculation, and the approximate extreme value of /(X) and its corresponding 
point in /2( n ) are / n and X( n ) respectively. 

Otherwise, proceed to the next step. 

Step 4. Find an integer m, 1 < m < n, such that 

fm -1 - fm = △ = Max {fk-i - fk} (IX.1.30) 

l<fc<n 


Step 5. Put 

/=/(2X ⑷ - X ( 0 ))， (IX.1.31) 

and if 

^(/o — 2/ n + /) (IX.1.32) 

then take X( n ) instead of X( 0 ) and seek the extreme value along the original direc¬ 
tions 77 (" ， r/( 2 ) ， … ,r/( n ) (i.e. return to the step 2 ). 


Step 6 . If 



\ [fo ~ 

- 2/n f) < A 

(IX.1.33) 

then put 

rj = 

x ㈨ -x (0 ) 

(IX.1.34) 

and select A* such that for 


X* = 

= X( n) + A.r? 

(IX.1.35) 

the function of / is minimized 

r = 

/(X*) 

=inf/(X<") + A»j). 

(IX.1.36) 


Now, take X* instead of X^°\ and seek the extreme value along the directions of 
” ⑴， … ，一 - 1 ),，. 1 ) .(IX.1.37) 

by returning to step 2 . 

Powell’s optimization method may be used in many practical problems for nu¬ 
merical calculation of seeking the extreme value of a nonlinear function. 
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(4). Roots identification of a polynomial by Jury’s method 

In ARMA model, the stationarity and invertibility of the series require all the 
roots of the two model polynomials 

p 

中⑷ = ^2 4>kZ k , 
o 

<7 

㊀ (2) = ^ 0 k Z k 

0 

to be located outside the unit disc. One of the procedures for identifying whether 
some of the roots are in the unit disc was suggested by Jury (1964) and is very often 
used in time series analysis. 

Suppose that a polynomial F(z) is with real coefficients and represented as 

F(z) = a n z n + a n _! 2 n_1 -h • • • -f a x z + a 0 , a n > 0. (IX.1.38) 

Put 


bk = flo a /c 一 ^-n a n — ki 
Ck = bobk — b n -ib n -\-ky 
dk =CQCk — C n _ 2 C n _2-ik 1 


k =0,1’... , n — 1, 
k =0, 1，…， n — 2, 
k =0, 1, … ，n — 3, 


(IX.1.39) 


recursively, until k = 0,1 ， 2,3, we may obtain, say *P 。， 尸 1 ，户 2 , 尸 3 , then define 

Qo = 尸 0 尸 0 _ P 3 P 3 , 

q2 =PoP2 — 尸 3 尸 l. (IX.1.40) 

Now, we have the following theorem: 

Theorem (Jury). Suppose that F(z) is a polynomial with real coefficients as 
(IX.1.38), then all the roots of F(z) are located inside the unit disc, if and only if 


(1) .F(l) >0; 

t f < 0. if n is an odd number; t 、 

(IX.1.41) 

I 〉 0, if n is an even number. 

( 2 ) . Ia 0 |<fl„; 

I b 0 1>| 6 n _, I ； 

k ? |>l c n _ 2 1 ； 

I % l>l (h I - 


(IX.1.42) 





are all true. 
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Corollary. If a n > a n _i > • • • > a 0 keeps true, then all the roots of F(z) are 
located in \z\ < 1. 

Since we are interested in whether all of the roots of the model polynomials are 
outside of the unit disc (not “inside”），so before applying the Jury theorem to our 
polynomials ^(z) and 0 ( 2 )，we have to make a modification. 

Put 

<5r( z )=J2^ p - k \ 

0 

q 

0 *( 2 ) =Y^hz q ~ k . (IX.1.43) 

0 

Then 

$(2)/0 ， | 2 | < 1, 

if and only if 

$*( 2 )# 0 , 1^1 > 1. (IX. 1 . 44 ) 

Indeed, suppose that the polynomial is represented as 

p . 

$( 2 ) = c ]^[ ( 2 : - z k ), \zk\ > 1, (IX.1.45) 

k — \ 

then we may rewrite 

$ ■⑷ 

o 

=z p y^j> k z~ k 

0 





So that 士 一 a = 0, if and only if 2 = ★，and \zjc\ > 1, i.e. \z\ < 1. 


(IX.1.46) 
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Similar conclusion also keeps true for 0* [z). 

If the order of the model polynomials are not greater than 2, the following simple 
identifications are available: 

(1). ARMA(1 ， 1): 

Let 


$(2) =1 + (f>iZ 
0(z) =do 4 - 0\z 

then $( 2 ) 0 ( 2 ：) ^ 0, \z\ < 1, if and only if 

f i^ii < i ； 
l |<M<‘ 


(EX.1.47) 


(2). AR(2): 

Let the model polynomial be 

= 1 + <{>\z + <t> 2^ 2 


then $( 2 ) ^ 0, \z\ < 1 if and only if 


1^1 < 1, 

(j>\ — <t>2 < 

<t>\ + <t>2> —1 


(IX.1.48) 


(3). MA(2): 

Suppose that the model equation is 

x(t) = $oe(t) + 0 \e[t 一 1) + 02 ^{t — 2) 
then 0 (z)= 沒 。 + 设 1 怎十沒 2 之 2 / 0 ， |z| $ 1， if and only if 

(\02\<0O 

I e l -e 2 < o 0 (ix.1.49) 

y 0\ -\- O 2 > 一 ❶ q 

Evidently, (IX.1.49) can be obtained by (IX.1.48) by rewriting 

㊀ ( z ) = ^0 + g: 2 ) 
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and (IX.1.48) may be derived from conditions (IX.1.41), (IX.1.42) of Jury theorem. 

IX.1.2 Modelling and Forecasting for the Temperature in Shanghai. 

Using the average monthly temperature records in 1952-1975 in Shanghai with 
288 data in all, the researchers fitted a seasonal ARIMA model as 


(E v d v^x(o = (E ㈣ *) (E e ^* M ) e W- 


where M = 12, £) = 1, 0q = = 1, = <^o = Var(e(i)) 


(IX.1.50) 


The fitting procedure is as in the following: 

1. Input the order parameters of the model p,d, q 、 P 、 Q. 

2. Select the initial coefficients vector 


p {0) = . d.. .0^) T , (IX.1.51) 

which satisfies the stationarity and invertibility conditions (could be checked by 
Jury’s identification). 

3. Given 6 > 0, find the M.S.S.E. of parameters of the model from equation 
(IX.1.24)-(IX.1.25) by Powell’s algorithm. 

Remember that in each step for seeking the optimum value by Powell’s method, 
the selecting parameters 

/3 (<+1 ) =/?('.) + A,V*.) (IX.1.52) 

should satisfy the conditions of stationarity and invertibility as well as confirming 
parameters are in the convex area of the function. 

4. For different parameters of order (p, d } q, P, Q), repeat the calculation of steps 
2 and 3 for obtaining the M.S.S.E. of /?* and S*. 

5. Put 

AIC(p 1 d i q i P t Q) = N\og (P + 9 + 尸 + Q + 1 + 〜 ,o) (IX.1.53) 

where [ • ] is the integral part, and 



when d ^ 0\ 
when d = 0. 


(IX.1.54) 


N is the sample size after making the difference V rf V 12 . 
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Jan.74 


monthly 


After steps 1-5, an ARIMA(p, x (P, l,Q)i 2 is fitted for the 288 data of 
average monthly temperature records in Shanghai, the parameters are 


d = 0, g = 2, 

D = 1 ， Q = 1, 


(IX.1.55) 


<t>\ =0.00 ， = —0.19, 

= - 0.09, e 2 = o.oi ， 


(IX.1.56) 


Researchers supplemented the data from 288 to 324 and repeat the fitted proce¬ 
dure mentioned above. All of the parameters, including the orders p, d ， g, 尸， Q 、and 
model coefficients 冷 ， H ㊀ are the same as (IX.1.55) and (IX.1.56), which show 
that those parameters selected for model fitting by step 1-5 are quite stable. 

The forecasting results are illustrated in Fig. IX. 1. 



0 )J529dBa; 


Fig. IX. 1 Average monthly temperature forecasting in 
Shanghai by seasonal ARIMA model. 
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IX.2 Outlier Analysis and Interpolation of Missing Data in a Measuring 
System 


IX.2.1 Basic Knowledge on Outlier Analysis*. 

(l). Additive Outliers (AO) and Innovation Outliers (10). 

In many automatic data recording (digital or analog) systems, a very troublesome 
problem is on outliers analysis, since many many unexpected factors may lead to 
the appearance of intervention, called “electronic hiccups”. For instance, Fig.IX.2.1 
is a computer output of water flow in the Xiang River with outlier. 



Fig.IX.2.1 Electronic hiccup appears in a river flow out¬ 
put record. 

In the figure, we may see that the point A; = 14 is an outlier value, which is 
a single point exerting no influence to the others. Such intervention is called the 
“additive outlier” (AO). 

An AO intervention model may usually be represented as 

y(t) = x(t) + P6 t ，j (IX.2.1) 

where x(t) is the underlying series, and 6 t j = 1, when t = j\ and = 0, otherwise, P 
is the additive value. 


* Readers may find the relevant knowledge in the book of Wei et al (1991) 
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2 4 6 0 10 12 

Fig. IX.2.2 IO happened at A: = 6 in the output of a 
water flow recording system. 


Another kind of intervention may not be a single one get with errors spreading to 
the later values of series and is called an “innovation outlier” (IO) (see Fig. IX.2.2). 

In Fig. IX.2.2, we may see intuitively, that the river flow data in the months 
k = 6,7,8 are abnormal, since usually there are large values recorded in Summer 
time. 

Mathematically, we may consider the IO model as 

y(t) = i(i) -f P\S t) j + + …， (IX.2.2) 

where {/?»} are unknown parameters, satisfying some convergence conditions. 

Suppose that i(t^ is a regular series (see Chapter 2), then it may be represented 
into a Wold series (i.e. innovation series): 

oo 

x(0 = ^2 - k )， c 0 = 1 (IX.2.3) 

o 

then a simplified IO model is represented as 

oo 

y(0 = [ c k (e(t - /c) + (IX.2.4) 

fc =0 

When t = 

y(j) =x(j) + 7 

y[j + k) =x(j + A:) + Cfc7) A: = 1,2,, (IX.2.5) 
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and since Ck \ 0, it shows that the IO influences the later series in a decreasing 
way. 

Outlier analysis researchers believe that a large portion of the outliers likely to be 
found in practice is of the AO and IO intervention models. 


(2). Score test for outlier analysis under the AR(p) model. 

Suppose that underlying series z ⑷ is an AR(p) model, which follows the equation 


2：(0 = 亡 (t>kx[t - k) + e(t), (<t>k = v4 P) ), 

fc=i 

then the AO and IO intervention models 

y ⑴ =i(0 十州 ， y 

oo 

k=0 

may be combined together to be a mixed intervention model 

I y(t) = 2 (<) + pS ti：j 


(IX.2.6) 


(IX.2.7) 


(IX.2.8) 


l $(f/)x(«) = e(t) + a6 t ,j 

Then the detection of the outliers, say at t = /c, may be judged by the following 
hypothesis testing 

Ho : a = ^ = 0; 

Hi : (a/0)u(^#0). (IX.2.9) 

Now, suppose that the model parameters of x(^), i.e. p, are known, and 

iid ^(0, a 2 ) series, then the log-likelihood function is 

S{4>,oc,P) 


N , N ' 2 1 ， ，， 

— log 2 丌 ——— log a + - det M p 


咖，卢 | y ( i )， … 冷) 

where • 

S(<p,a,p) = - P 6 i,k)[yU) ~ P 6 j,k) 


(IX.2.10) 


^ ( p 'i 2 

(2/(0 - P 6 i,k) - 小八 - r ) - P 8 t-r t k - a6t tk ) l 

=p+ 1 ^ r=l ' 


M v 


10 

11 


1\ 

lo 


、 7p-l lp-2 


Ip-1 
lp-2 

10 


{ m i,j}pxp 


(IX.2.11) 

(IX.2.12) 
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7z is the covariance function (see (IX.1.11), (IX.1.12)). 

Now, suppose k > p, then under Ho, the hypothesis testing for t = k by Score 
method is based on the following theorem: 

Theorem (Score testing). Suppose that the intervention model is (IX.2.8), where 
x(i) follows the model 


工 ⑷ =^2 一 j) + e(0 + aS t,k 


(IX.2.13) 


where e ⑷ are iid N(0 y a 2 ) series, 0, p, a 2 are known parameters then under H 。， 
for k > p, the Score statistic p 

作 ) (§和( ㈣ ) 2 

^ ( IX . 2 . 14 ) 

follows a x 2 (2) distribution, with / =i 


= $(C/)y(t), < = A;,/c -f 1 ,... ,k p. 


(IX.2.15) 


This theorem may be easily derived from the following result (see Cox and Hinkley 
(1974), Wei et al (1991)): 

Under the hypothesis Hq, the Score statistic 


f dL dL\ A (dL dL 、 


、 dP' da, 


doc, 


(IX.2.16) 


I a=/?=0 


where J is the Fisher’s information matrix. 

In fact, from (IX.2.10)-(IX.2.12), we may have (since k > p 、there are no a, P 
involved in the first double summation in (IX.2.11)) 


^ [(y(0 - M 

v t=p+l L 

P -j 2 N 

- YL 办 - r ) - P^t~r,k) - Cx6 t)k I 

r=l 」 J 

i ^ / P \ 

y^( ~^t,k + ^2 ^r^t-r,k J (l/(0 - P^t,k) 

t = k ' r =l 

P - 

-[ <f>r(y(t -r) - ps t -r t k) - a6 t ， k 


(IX.2.17) 


(IX.2.18) 


dL 丄 
dot a 2 


(yW - 泛)-亡 <t>ry{^ -r) - a 


(IX.2.19) 



Upon putting a = /? = 0, it follows that 
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( 靠 ) 一 。 = -▲ t^~ 6t ' k + t^ t -r, k ){y(t) — 自 — 

1 p +k / p \ 

=- - [么!/(卜 r )) 

t=k 、 r=l ’ 

= - 4 A (y “ + 灸） 一 + k-r)\ 


(We define <po = —1), since 

p ( _ 

- Hr,k = I Q 

Similar results may be obtained 


1, when t = k\ 
when t > k p. 


( 






d 2 L _ 

= 

d 2 L _ _l_ 
d0da o 21 

d 2 L _ 1 

da 2 o 2 

So that the Fisher’s information matrix now is 

(d 2 L d 2 L 

J = -E 


dp 2 dpda 
d 2 L d 2 L 


乂 docdp da 2 

: 翁 :） 


and 


J _1 




- 1 乞 A 


4 


(IX.2.20) 


(IX.2.21) 


(IX.2.22) 


(IX.2.23) 


(IX.2.24) 


(IX.2.25) 
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<f>o = - 1 . 

Putting (IX.2.20)-(IX.2.25) into the SC statistic (IX.2.16), we have 


sc k ={o^4>i) 


<s 


2 [yZ ( t ) i £ ( k + 0 )^(^) + )^ 2 w 


(& 


Now, we rewrite 


(E &£：(/： + 0) = (E + 0) — 2e(A:) ^2 + 0 + 

2([ (pis(k 4 - :.))e:(A:) =2e(k) ^ 4>ie(k + i) - 2s 2 (k) 
and put back into (IX.2.26), it follows 

SC *+ 2 亡 W ) 1 (矣 KA : + i )) 2 + e 2 ⑷矣# 

f P \~ X / P \ 2 

^ 2 E^- ? ) (E^ + o) 


a 2 


that is just the statistic (IX.2.14). 

Again, since e(t) iid 7V(0,cr 2 ) series, 〜 X 2 (l )， 


Ef=i + 0 


a 2 EU? 


x 2 ⑴， 


C 2 (Jc) .. - 

and - A ^ is independent of 


ELi 4>ie(k + i) 


(IX.2.26) 

e 2 ㈨ 

(IX.2.27) 

(IX.2.28) 

(IX.2.29) 

(IX.2.30) 

(IX.2.31) 


so that by (IX.2.30), SCk 〜 x 2 (2) holds. 


(IX.2.32) 



IX.2.2 Interpolation for Missing Data for AR(p) Model. 
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After the outliers have been detected, then the next step is to solve the problem 
“how to replace these outliers by appropriate values?” This topic in time series 
analysis is related to the interpolation for missing data. 

Suppose that the underlying series x(t) in the observation is an AR(p) model 

p 

x(t) = _ r ) + ^(0 (IX.2.33) 

r= 1 

where e ⑴ are iid N(0, a 2 ) and the parameters of the model (IX.2.33) are known. 
Let {x(j)}^ be observations involving missing data 

X M = {x(n J )}7 t C{z(y)}^ (IX.2.34) 

where p < nx < ri 2 < ••- < n m < N — p + 

Put 

M ={n 1 ,n 2 ,... ,n m }, 

K ={k \, A ： 2,... , A: r }, (IX.2.35) 

where 

{r ㈨)， ~ eK} = {x[j)}^\X M . (IX.2.36) 

Then we have the following theorem: 

Theorem (Interpolation for missing data for AR(p)). Suppose that x(t) is 
an AR(p) model, under the conditions assumed in (IX.2.33)-(IX.2.36), then the 
following linear equation 

^ ^ ai_yx(t), j G M (IX.2.37) 

ieM ieK 

has a unique solution 

- Xm ={i(0 ， *6M}, 

and 

X M =E[X M \X K ,^,a 2 } (IX.2.38) 

is the maximum likelihood estimator, where (<f>o = ~1) 

p — k 

a k = a^ k = [ + A: = 0,1,... ,p. (IX.2.39) 

j=o 
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It follows from this theorem that the interpolations for single point for AR(p) 
model are: 

£ , (x(m)|XK,<^,cr 2 ) = t 二 1 的 (x(m — 1) + x(m + 1)), when p = 

E(x(m)\X. K ,4>,a 2 ) = (“ 以 ( 咖 - 1) + a:(m + 1)) 

+ l+"P+(t> 2 ( Z ( m 一 2 ) + z ( m + 2 ))， whenp 
more generally, we have 

E(x(m)\X Ki <t>,c 2 ) = I - s) 十 rc(m + s))} ， （ IX.2.42) 

= 1， for the AR(p) model. 

For interpolating a segment of missing data, for example, in an AR(l), let 

Xk =(:c(l) ， … ,x(s - l),x(s + m),... ,x(7V)), 

Xm =(x(5), x( 5 + 1), … ,x(s + m — 1 ))， (IX.2.43) 

be the sets of known and missing data respectively, then the estimates are 

x(s- l+y) =-~ ! (m+1 y - <^ (m+1) — J )z(s _ 1) 

1 一 01 

+ + m) ) 

j =1,2,... , m. (IX.2.44) 

For AR(p), p > 1, the interpolation formula may be obtained by solving the linear 
equation (IX.2.37). 

If the series is not with a zero mean, then it is necessary to make a centralized 
computation 

壬 (0 = x (0 - 

1 N 

M = jy _ m (IX.2.45) 

k—l 

before using formulas mentioned above. 


(IX.2.40) 

= 2， 

(IX.2.41) 



IX.2.3 Practical Application for a Range Measuring System. 
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A range measuring system needs outlier analysis for its output and makes recti¬ 
fications for those outliers by the optimum interpolation procedure. Fortunately, 
the data analysis procedure is not required to be on line processing and the error 
percentage of the output, in the general case, is not more than 5%. 

The processing procedure is carried out from one segment to other segment suc¬ 
cessively and may be stated as follows. 

Step 1. Suppose that Z — {z(t),t = 1,2,... , N} is a segment of the output 
data, it may be decomposed into two subsegments 

Z ={z[k ), k = 1,2,... , L) U {z(r),r = L + 1,... , N} 

= Z (1) U Z (2) ; (IX.2.46) 

or 

Z ={z(k),k = 1,2,... , TV — L} U {z(r),r = N — L + 1^... , N} 

⑴ U Z(2 )； (IX.2.47) 

where Z(" or Z^) does not involve outliers (since, the error percentage is not more 
than 5%, so such decomposition is not quite difficult), and in our case we have 
120 < N < 250, L = 40. For the convenience of statement, we assume that Z ⑴ is 
the set which contains no error output. 

Step 2. If a linear trend exists in the output data ( 2 (^)}, then remove the trend 
by subtracting the linear regression function Reg ⑴ 

y(0 = z(0-Reg(0. (IX.2.48) 

Or what amounts to the same by taking a first order difference 

y(t + 1) = V 2 ⑴ =z(t + 1) - z(t) (IX.2.49) 

before carrying out the outlier analysis. 

In our case we use the last procedure. 

Step 3. Put y(0 = ▽ 2 (0, 2 (0 € 之⑴ and fitted an AR(p) model for y(t) 

p 

y(0 = ^ <t>ky[t 一 灸） + ^0^(0 (IX.2.50) 

k— 1 

with the Burg algorithm and BIC order selection (see Chapter 2). 
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Step 4. For y(t) = Vz(/), z(t) G Z( 2 ) carry out the Score test (see (IX.2.14)- 
(IX.2.15), where the parameters are determined by the fitted model (IX.2.50). 

Step 5. After picking out the outliers, then make interpolation supplements to 
those values by formulas (IX.2.42)-(IX.2.44), or more generally, by (IX.2.37). 

For example, for a segment of the output of our range measuring system, we have 
a subsegment 

Z (1) = {y(0 = ▽2(0〆= 2,3,... ,45} (IX.2.51) 

and no outlier involved in 之 ⑴. Then for y ⑷， an AR(l) is fitted 

y ⑷ = y 十 0.1543(y(^ - 1) - y) + 0.3618s : ⑷ (IX.2.52) 

where 

1 45 

y = — V y(0 = 9 533. (IX.2.53) 

44 

2 

The Score test showed that in the set of 

Z W = {y(0 = = 46,47, … ,120} (IX.2.54) 

{y ⑷， t = 49,56,57,58, 59} are outliers, under the significance level a = 0.05, 
X 2 (2) = 5.991. 

By the interpolation formula (IX.2.40), we have 

y(49) =9.533 + l ^ ) Q 15 l 4 5 5 452 -((y(48) — 9.533) + (y(50) - 9.533)) 

=9.4706 

For k = 56,57,58,59, the rectified value are §(56) = 9.605, y(57) = 9.542, y(58)= 
9.519, y(59) = 9.435. 

Some related data are presented in Table IX.2.1. 

Compare with the original output data we may find that only the values of z{i) 
at ( = 49,50, 57 need rectification. 



271 


Table IX.2.1 


t 

z( 2 ) 

y 

SC 

Re S l - 

z 

48 

8874.5 

9.4 

4.765 

* * * 

8874.5 

49 

8884.8 

10.3 

6.292 

9.47 

8883.97 

50 

8894.0 

9.2 

1.814 

傘傘 《 

8893.17 


55 

8942.1 

10.0 

2.196 

傘*傘 

8942.1 

56 

8951.4 

9.3 

5.9E7 

9.605 

8951.7 

57 

6166.3 

-2785.1 

1.39£?8 

9.542 

8961.2 

58 

8970.7 

2804.4 

8.09 芯 7 

9.519 

8970.7 

59 

8980.3 

9.6 

1.4 五 6 

9.435 

8980.2 

60 

8989.2 

8.9 

4.146 

* 傘《 

8989.1 
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本书 是一本 有关时间序列分析应用于实际的实证分析研究的专著。全书分为两大 部分： 
第一部分简要介绍了时间序列分析的基础理论和方法.这些内容是读懂本书各案例研究所必 
备的基本知识：第二部分是案例研究。从中读者可看出时间序列分析是如何广泛地应用于实 
际并成为解决各种问题的核心工具。书中的案例涉及到当年中国科学家从自己的观测记录中 
是如何发现天王星的光环的，滤波理论如何应用于中国东海和黄海的重力勘探.谱分析如何 
判别先天性愚型儿童的脑电特征、多元谱的 K - L 信息量如何应用于优秀飞行员的生理特征的检 
测，潜周期分析如何发现离体脑垂体仍有内分泌的节律周期，预测理论如何应用于气象的建 
模和预报，等等许多非常有趣而真实的研究案例。这些研究成果使作者获得了中国国家自然 
科学奖和国内外的多项奖项。 

读者通过本书的学习不仅可学到时间序列分析的基本理论和方法，更重要的是本书介绍 
了 "如何将一个实际问题转化成数学问题' 然后运戶数学和统计学的理论和方法加以解决. 
这包括最后还原到实际.用实验数据加以检验的完 t t 程。 

本书可作为应用时间序列分析领域的大学生 r 研究生教学参考书或补充教材，也是应用 
统计工作者和相关学科的科技人员.工程师很有 t 值的参考资料^ 

时间序列分析的历史显示出它从科学和技术交叉学科 t 得义了巨大的推劝力 .. 本书包含 
了一组令人高兴的在中国进行的真实案 例研完 • 这一组案例 k 究包括有地球物理勘探，唐氏 
症候群，荷尔蒙释放，天王星光环的 检测. 货物和江河流量杓预报，等等 „ 在案例研究之前 
本书给出了概括性的对时间序列的理论和方法的 介绍- 对彳 t 何严谨的时间序列分析课《，本 
书会是一本很好的#充材料。 

H. Tonf ； Short Hook Review, VolJ4, No.2, 1994 


对于运用时间序列建模并熟悉基本概念的人，这是一本极好的参考书。书中的第一部分 
简要回顾了基本概念，但该 介紿是 比较浓缩的并且还带有专业性 

本书的真正重要性是它对许多有趣而广泛领域中的案例研冗的系统表述, ； 其中包括在地 

球科学、医学科学和工程科学。 . 本书会受欢迎的，对市场上已有的许多时间系列书籍的 

也是一个补充， 

Sat Narain Cupta Mathematical Review, Vol.4, No. 5, 1995 

本书是非常有趣并可介绍为时间序列分析课程的补充材料_ 
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